One of the fundamental principles of science is that the results of any experiment should be reproducible. Reproducibility is essential because it means that the results can be relied upon, as they are more likely to be true. Unfortunately, there is little fame in replicating someone else’s study; it is also hard to get such studies funded (because they are not ‘novel’). Consequently, many studies are not repeated and many findings stand alone without verification from separate, independent researchers. This is a problem because often when studies are replicated, they fail to reproduce the original findings.
To get the replication/reproduction terminology clear from the start, I will refer to replicating or repeating studies (doing the same research again, preferably independently) and whether or not the replicated study reproduces the same results.
From the definitions above, we have two problems: firstly studies are often not being replicated and secondly, when they are being replicated, they often fail to reproduce the results of the initial study. Follow me so far?
The good and the bad
- Reproduction of previous results is a good thing: it is a verification of the findings of previous research, therefore increasing the probability that those findings are true.
- Failure of the replication study to reproduce the original study results decreases the probability that the original findings were true. This is bad for the original researchers, but it is good for everyone else because it is science’s way of detecting errors – of self-correcting.
- Failure to replicate studies (at all) is bad for everyone because it means that we are less certain that the original results are true, and we could be holding onto to these false beliefs for a very long time, screwing up our practice of medicine.
According to Popper, it is a basic tenet of science that any finding / statement / theory must be falsifiable. If something cannot be disproved then, in effect, it cannot be challenged and becomes dogma, not science. A study or theory that stands up to attempts to falsify it is a more robust one.
Replicating studies goes hand in hand with falsifiability. If nobody is going to repeat a study (and attempt to falsify it) then it doesn’t matter if it is falsifiable or not. Being falsifiable is not enough: theories gain strength from standing up to attempts at falsification, and study findings need to be reproduced if they are to be relied upon.
If they can’t be reproduced, then the conclusions of a study are not only not right, they are not even wrong (Pauli).
An article in Nature from 2012 reported an attempt by researchers to replicate 53 landmark or breakthrough papers in the field of pre-clinical cancer research. Despite multiple attempts at reproducing the findings in their own lab, they could only reproduce the findings of 6 out of the 53 studies. They even went to the extent of contacting original authors and repeating the studies in slightly different ways.
In one case, when told that despite replicating his original research 50 times, that the original findings were not able to be reproduced, the author of the original paper said that he had done it 6 times himself, and only produced the interesting findings once. That means that the original researchers could not even reproduce their own findings in their own labs. Yet they only reported the positive study, not the negative studies. Depending on how you they did it, that is either reporting bias or publication bias, but either way it is biased.
Worryingly, several of the non-reproducible studies had already spawned new fields of research that were founded on the original findings, yet never validated them through re-testing.
Lamenting the declining success rate of clinical trials and a widening gap between discoveries that were ‘interesting’ and those that were ‘feasible’, a team from Bayer replicated 67 published in-house projects (link here) and could only reproduce the original findings in about 20-25%.
Both these papers blame sloppy science by the researchers, and the academic system (which rewards interesting findings).
A 2005 review by Ioannidis looked at the biggest articles (highest number of citations) in three major general medical journals and found that most of them were either not replicated (24%) or were replicated, but the results were not reproduced (no effect - 16%, or lesser effect - 16%).
In a recent review of medical publications testing standards of care, it was found that such studies were more likely to refute the original findings than reaffirm them. Interestingly, this reaffirms Ioannidis’ findings.
Psychology is particularly hard hit by lack of reproducibility, which has sparked such endeavours as the Reproducibility Project (here). This is part of a larger validation project looking to replicate and validate scientific findings (here).
Why is there a lack of repeat studies?
There is a perception that only novel science is rewarding. This may be true, as novel science is rewarded by grant money, promotions, doctorates and fame. Who wants to do a study that someone else has already done, and do it exactly the same way? Well, I do. That is the only way I can find out if those interesting studies that pop up every now and then are true.
- Firstly, scientists need to be more scientific; a lot of this is sloppy science. This is covered in a previous post on Manufactured Significance. The methodology needs to be better, and studies need to have explicit protocols published prior to the research starting.
- Secondly, there needs to be more journals like PLOS, who publish anything they get, completely and open access, all on line, based purely on scientific merit not newsworthiness or novelty.
- Thirdly, funders (grant funders, industry, universities, governments) need to appropriately prioritise replication research, particularly of important findings that have significant clinical and resource implications.
- Fourthly, the readers of the research need to be aware of the problem, and they need to equip themselves with the tools for determining scientific validity and start using those tools.
The bottom line
For many reasons, there is a lack reproduction of scientific findings. Firstly because studies are not being repeated, and secondly because when they are, the results are often not reproduced. This knowledge gap is being recognised, but it will only be filled when those who control research output (the scientists themselves and those who control financial and academic rewards) appreciate the importance of repeating other people’s studies.
- The lack of reproducibility in current research was also highlighted in this Economist article from 2013 with examples across many fields of science.
- Another study in psychological science from 2015 here.
- Another blogger has also covered this topic here, and offered some solutions here.
- Organisations are now aware of these problems. The US Office of Research Integrity is a good reference, as is Retraction Watch, The Reproducibility Initiative, and the Committee on Publication Ethics and Retraction Watch.