Fisher certainly
understood that clearing the significance bar wasn’t the same thing as finding
the truth. He envisions a richer, more iterated approach, writing in 1926: “A scientific fact should be
regarded as experimentally established only if a properly designed experiment
rarely fails to give this level of significance.”
Not “succeeds
once in giving,” but “rarely fails to give.” A statistically significant
finding gives you a clue, suggesting a promising place to focus your research
energy. The significance test is the detective, not the judge. You know how
when you read an article about a breakthrough finding that this thing causes
that thing, or that thing prevents the other thing, and at the end there’s
always a banal sort of quote from a senior scientist not involved in the study
intoning some very minor variant of “The finding is quite interesting, and
suggests that more research in this direction is needed”? And how you don’t
really even read that part because you think of it as an obligatory warning
without content?
Here’s the
thing—the reason scientists always say that is because it’s important and it’s
true! The provocative and oh-so-statistically-significant finding isn’t the
conclusion of the scientific process, but the bare beginning. If a result is
novel and important, other scientists in other laboratories ought to test and
retest the phenomenon and its variants, trying to figure out whether the result
was a one-time fluke or whether it truly meets the Fisherian standard of “rarely fails.” That’s
what scientists call replication; if an effect can’t be replicated, despite
repeated trials, science backs apologetically away. The replication process is
supposed to be science’s immune system, swarming over newly introduced objects
and killing the ones that don’t belong.
That’s the ideal,
at any rate. In practice, science is a bit immunosuppressed. Some experiments,
of course, are hard to repeat. If your study measures a four-year-old’s ability
to delay gratification and then relates these measurements with life outcomes
thirty years later, you can’t just pop out a replication.
But even studies
that could be replicated often aren’t. Every journal wants to publish a
breakthrough finding, but who wants to publish the paper that does the same
experiment a year later and gets the same result? Even worse, what happens to
papers that carry out the same experiment and don’t find a significant result?
For the system to work, those experiments need to be made public. Too, often
they end up in the file drawer instead.