This paper is a wonderful example of scientists using the wrong tools to attack a problem. Physicists grow up with wave theory and LOVE to fit sine curves to everything. A lot of phenomena studied by physicists do show periodic behavior, and can be analyzed by fitting sine curves (multiple ones in Fourier analysis) or polynomials. Often such analyses yield powerful results.

But why would anyone think that such an analysis could be applied with any hope of real-world insight to the marine fossil record? We have known for a long time that that fossil record shows major episodes of dramatic extinction, with slower periods of diversity increase (see Jack Sepkoski's data on families from 1981, redrawn as Figures 6.7 to 6.10 in my book). In other words, it's punctuated with major events (catastrophes?) such as mass extinctions. The major fluctuations in fossil diversity are not well described by sine curves or simple polynomials, and obviously do not reflect underlying phenomena that wax and wane in a smooth fashion: certainly not in a cyclic one. You wouldn't try to analyze any time series that contains catastrophic events in this fashion: hurricanes, tornadoes, major earthquakes, major eruptions, and airplane crashes have causes, but the dramatic spikes in cost of property and/or lives are not represented well by smooth curves, nor are these catastrophes directly produced by smoothly operating agents.

Now let's move to what Rohde and Muller actually did. First, they threw away more than half of Sepkoski's data: those genera that happen to have had a short history and/or a small number of fossils. These discarded data are real data, of course, and throwing half the data away may very well bias the analysis, for example against times when marine life was evolving quickly.

Next, Rohde and Muller tried to fit a simple polynomial curve to the remaining diversity data. You really have to look at their curve to realize how inadequately it reflects the actual events of the fossil record. A smooth curve can get nowhere near to representing the dramatic extinctions of the record, so this polynomial has no scientific meaning that I can imagine. Nevertheless, it was meant to represent the major diversity trends in the fossil record.

So what do Rohde and Muller do with this polynomial curve of major trends? They throw it away! Technically, they subtract values along the polynomial from their crippled data set. So once they've done their best to remove any major trends from the fossil record, they analyze what's left by Fourier analysis. There's still no reason why there should be any cycles in the data they have left, but of course there have to be, when Fourier analysis is applied. If you are familiar with Fourier analysis, you'll remember that it actually is forced to produce cycles, no matter what data you look at. What you need to decide after you get the cycles from the Fourier analysis is whether the cycles are real. In the case of Rohde and Muller's Fourier analysis, the 62-m.y. year cycle the analysis produces accounts for only 35% of the variance in (what's left of) the data. Is that good or bad? They imply that it's good but don't say so. They do say that the fit is not good because their data has rapid drops in diversity (extinctions) and more gradual rises (recoveries), so does not fit well to sine curves. Well, duh!!!

In a final insult to the data, Rohde and Muller then take those genera that lasted a long time (> 45 m.y.) and throw them away too. Now the 62-m.y. cycle is stronger. Notice the irony here. They had already thrown away the very short-lived genera; then they did their best to throw away the major trends in the data; now they are throwing away the very long-lived genera. And we are expected to join them in believing that this makes their result stronger, and that the result means something!

Obviously, the rest of the paper has to depend on the analysis so far. Although Rohde and Muller struggle hard to find some, please, any, agent that could generate their 62-m.y. cycle, they can not. It's my prejudice that their whole exercise is a waste of time, because the statistical tools applied to the data were entirely inappropriate. No doubt the paper (published in Nature, for heavens' sake) will now require an industrial-scale effort to assess it properly (just as Muller's Nemesis idea did), when we could all be doing better things.

I also find it rather worrying that Nature chose a scientist from Muller's own institution to write a (positive) commentary on the paper. So you'll just have to make do with this negative one for now, until an objective (and informed) critique arrives. Note that Kirchner (and Weil) write, "Rohde and Muller demonstrate a 62-m.y. cycle in fossil biodiversity during the Phanerozoic." NO, THEY DO NOT!!! They demonstrate that Fourier analysis (which MUST produce a cycle) found the best cycle it could in the remnants of a crippled data set, and that cycle had a period of 62 million years.

Note that Rohde and Muller say the 62-m.y. cycle is weakest where we know the fossil record best, over the last 150 m.y. That says to me (again) that there is something really wrong with it.

I feel better now. But I really shouldn't leave you here. Where did these cycles come from?

Diversity goes up and down. When you fit a polynomial to fluctuating data, you ask it to leave as little a residual as possible. In other words, you force it to run between the highs and lows, so that the residuals are strung out along a horizontal x-axis, with about as much below the axis as there is above it. The reason you lay the residuals out along the axis is so that you can apply Fourier analysis to them, and you do this because you are hell-bent on perceiving cycles.

Now a curve that is smooth over the 500+ m.y. of the Phanerozoic is going to cut below the peaks of fossil diversity and above the troughs. In other words, the residuals will (by necessity) contain the diversity swings of the record. When you apply Fourier analysis, you define the best sine curve as the one that leaves the least mess when it in turn is subtracted from the residuals. That forces the best sine curve to pay most attention to (have its wave form anchored by) the lowest lows and the highest highs. You can see what's coming...

The Fourier analysis is nailed down by the biggest extinctions and the biggest peaks of diversity. But because the extinctions are dramatic, they are more important as fixed points than the more gradual rises in diversity. So here we are, counting backwards to the Three Big Ones: the K-T (65 Ma), P-T (250 Ma), and the end-Devonian (more exactly the Frasnian-Famennian (375 Ma). That's three nail points, and the two intervals between them happen to be 185 and 125 Ma. That's a set-up for a sine curve of 62.5 m.y., giving three and two complete waves between the fixed low points.

Maybe you could weasel around trying to find reasons why major extinctions might have this separation. But consider this next point. To appreciate it fully, you need to look at the published curves of Muller and Rohde, and I'm sorry to say that Nature doesn't allow you Web access unless you (or your institution) has a subscription. When statisticians fit polynomials, all they ask is that the curve adjusts to the raw data to leave as small a set of residuals as possible. In the particular case here, the best polynomial suggests NEGATIVE diversity in the earliest Cambrian: a biological impossibility. Thus, of course, when the biologically impossible polynomial is subtracted from the diversity DATA, it generates completely unrealistic values that cannot have any basis in natural events or processes. To cut a long story short, the calculated residual curve has in it a high in the earliest Cambrian that was generated by this spurious procedure, and a major low in the late Cambrian that is much deeper than the real, rather muted diversity swing in the original data. Thus the Fourier analysis has to accommodate itself to a late-Cambrian low that is as deep as the three major extinction lows later in the Phanerozoic. In other words, the data processing has generated a false "nail point" for the Fourier analysis. And what is the timing of this spurious low? 500 Ma! By a massive quirk of fate, that's exactly 125 m.y. before the FF low, and ensures that the 62.5-m.y. sine curve is confirmed all the way back to the earliest Phanerozoic.

So in fact, I argue that Rohde and Muller's 62-m.y. cycles are set up by a mixture of accidental coincidence (the timing of the three big extinctions) and spurious data processing (for the fourth big low). No wonder all their tests confirm it as "significant", "robust", etc. But unless you are prepared to argue that the three big extinctions have an underlying 62-m.y. cyclic causation that sometimes works and sometimes doesn't, and that the spurious inflation of the diversity swing in the late Cambrian happens to coincide with some reality, I don't see what you can conclude except that there is no real-world significance to this paper. Now of course, there's a difference between statistical "significance" and real-world significance, but I won't argue that here.

Two final notes. 1. With only one catastrophe after the Permian, fixing the K-T at 65 Ma, the 62-m.y. cycle is automatically less well fitted to the last 150 m.y. of the record than it was in earlier times, just as Rohde and Muller noticed.

2. Why did Rohde and Muller get "better" results when they threw away the short-lived genera? It's because of another feature of Fourier analysis: the amplitude of the sine wave, which is forced to be constant. Most of the Phanerozoic doesn't have major extinctions, so the best sine curve can't have high crests and deep troughs, but has to find the best overall result for both quiet times and dramatic times. If Rohde and Muller had kept the short-lived genera, which obviously were more frequent at times of peak diversity, the highs would have been higher and the best sine curve for all data would have been an even worse fit than the one fitted to the crippled data set.

The paper is Rohde, R. A., and R. A. Muller. 2005. Cycles in fossil diversity. Nature 434: 208-210; and comment, pp. 147-148.

This page composed March 12, 2005; revised 16 March 2005.