One of the fundamental principles of science is that the
results of any experiment should be reproducible. Reproducibility is essential
because it means that the results can be relied upon, as they are more likely
to be true. Unfortunately, there is little fame in replicating someone else’s
study; it is also hard to get such studies funded (because they are not ‘novel’).
Consequently, many studies are not repeated and many findings stand alone
without verification from separate, independent researchers. This is a problem
because often when studies are
replicated, they fail to reproduce the original findings.
To get the replication/reproduction terminology clear from
the start, I will refer to replicating or repeating studies (doing the same research again, preferably independently)
and whether or not the replicated study reproduces the same results.
Two problems
From the definitions above, we have two problems: firstly studies are often not being replicated and
secondly, when they are being replicated, they often fail to reproduce the
results of the initial study. Follow me so far?
The good and the bad
- Reproduction of previous results is a good thing: it is a verification of the findings of previous research, therefore increasing the probability that those findings are true.
- Failure of the replication study to reproduce the original study results decreases the probability that the original findings were true. This is bad for the original researchers, but it is good for everyone else because it is science’s way of detecting errors – of self-correcting.
- Failure to replicate studies (at all) is bad for everyone because it means that we are less certain that the original results are true, and we could be holding onto to these false beliefs for a very long time, screwing up our practice of medicine.
Falsifiability
According to Popper, it is a basic tenet of science that any
finding / statement / theory must be falsifiable. If something cannot be
disproved then, in effect, it cannot be challenged and becomes dogma, not
science. A study or theory that stands up to attempts to falsify it is a more
robust one.
Reproducibility
Replicating studies goes hand in hand with falsifiability. If
nobody is going to repeat a study (and attempt to falsify it) then it doesn’t
matter if it is falsifiable or not. Being falsifiable is not enough: theories
gain strength from standing up to attempts at falsification, and study findings
need to be reproduced if they are to be relied upon.
If they can’t be
reproduced, then the conclusions of a study are not only not right, they are
not even wrong (Pauli).
Examples
An
article in Nature from 2012 reported an attempt by researchers to replicate
53 landmark or breakthrough papers in the field of pre-clinical cancer
research. Despite multiple attempts at reproducing the findings in their own lab,
they could only reproduce the findings of 6 out of the 53 studies. They even
went to the extent of contacting original authors and repeating the studies in
slightly different ways.
In one case, when told that despite replicating his original
research 50 times, that the original findings were not able to be reproduced,
the author of the original paper said that he had done it 6 times himself, and
only produced the interesting findings once. That means that the original
researchers could not even reproduce
their own findings in their own labs. Yet they only reported the positive
study, not the negative studies. Depending on how you they did it, that is either
reporting bias or publication bias, but either way it is biased.
Worryingly, several of the non-reproducible studies had
already spawned new fields of research that were founded on the original
findings, yet never validated them through re-testing.
Lamenting the declining success rate of clinical trials and
a widening gap between discoveries that were ‘interesting’ and those that were
‘feasible’, a team from Bayer replicated 67 published in-house projects (link here)
and could only reproduce the original findings in about 20-25%.
Both these papers blame sloppy science by the researchers,
and the academic system (which rewards interesting findings).
A 2005
review by Ioannidis looked at the biggest articles (highest number of
citations) in three major general medical journals and found that most of them
were either not replicated (24%) or were replicated, but the results were not
reproduced (no effect - 16%, or lesser effect - 16%).
In a recent
review of medical publications testing standards of care, it was found that
such studies were more likely to refute the original findings than reaffirm
them. Interestingly, this reaffirms Ioannidis’ findings.
Psychology is particularly hard hit by lack of
reproducibility, which has sparked such endeavours as the Reproducibility
Project (here). This is part of a
larger validation project looking to replicate and validate scientific findings
(here).
Why is there a lack
of repeat studies?
There is a perception that only novel science is rewarding.
This may be true, as novel science is rewarded by grant money, promotions, doctorates
and fame. Who wants to do a study that someone else has already done, and do it
exactly the same way? Well, I do. That is the only way I can find out if those
interesting studies that pop up every now and then are true.
The solution
- Firstly, scientists need to be more scientific; a lot of this is sloppy science. This is covered in a previous post on Manufactured Significance. The methodology needs to be better, and studies need to have explicit protocols published prior to the research starting.
- Secondly, there needs to be more journals like PLOS, who publish anything they get, completely and open access, all on line, based purely on scientific merit not newsworthiness or novelty.
- Thirdly, funders (grant funders, industry, universities, governments) need to appropriately prioritise replication research, particularly of important findings that have significant clinical and resource implications.
- Fourthly, the readers of the research need to be aware of the problem, and they need to equip themselves with the tools for determining scientific validity and start using those tools.
The bottom line
For many reasons, there is a lack reproduction of scientific
findings. Firstly because studies are not being repeated, and secondly because
when they are, the results are often not reproduced. This knowledge gap is
being recognised, but it will only be filled when those who control research
output (the scientists themselves and those who control financial and academic
rewards) appreciate the importance of repeating other people’s studies.
Other links
- The lack of reproducibility in current research was also highlighted in this Economist article from 2013 with examples across many fields of science.
- Another study in psychological science from 2015 here.
- Another blogger has also covered this topic here, and offered some solutions here.
- Organisations are now aware of these problems. The US Office of Research Integrity is a good reference, as is Retraction Watch, The Reproducibility Initiative, and the Committee on Publication Ethics and Retraction Watch.
What about too many reproducibility studies that lead to research waste? There are times when we do not need any more repeated studies. It most often occurs when people are trying to prove that something works when the previous studies have shown they don't. In this instance it is important to consider whether a repeated study is worthwhile and what it would add. One can look to systematic reviews and meta-analysis to determine how large a study would need to be to overturn the results of previous studies.
ReplyDeleteExcellent point. Most of the failure to reproduce/replicate is for studies with positive/surprising results, like antibiotics curing back pain. I think those studies HAVE to be replicated. Also, in my own field, I find that 'negative' research is not taken up or accepted, and therefore a replication study could strengthen the evidence base and be more likely to effect a change in practice.
ReplyDeleteAn example of what you are saying would be PRP injections, where the evidence is overwhelmingly negative, but people keep doing more research, and occasionally studies come up positive. I guess I have less of a problem with that (part of my No-such-thing-as-bad-data philosophy) except, as you say, that it is a waste of resources.
Only 16% of highly cited articles were contradicted later, as reported in the study by Ioannidis. Is that surprising, given the variability in populations, selection criteria, outcome measurements, etc that occur in clinical studies, even those which are considered identical by Ioannidis? It's hardly news. I'm surprised it's not more. One of the things about looking at these contradicting studies is often examining the subtle differences between these "identical" studies as way of determining if these differences are responsible for the outcomes. I often find it's more valuable looking at contradicting studies this way, rather than throwing up one's arms in the air and declaring that all research is flawed.
ReplyDeleteThanks John and I agree: exploring why similar studies show different results can be very revealing. The Nature studies were a little different as they tried to replicate lab studies precisely and it appears that, despite a public perception that lab research is somehow more 'pure', it appears that lab research is often a lot less rigorous than clinical research (less blinding, randomisation etc.), possibly because of the high methodological and ethical standards now required of clinical research. See my recent (June) post for a similar comment on animal research.
DeleteAnd I definitely don't want to give the impression that all research should be disregarded, only that a lot of it is biased, and that bias in research needs to be reduced (which, I believe it is, slowly) and to be detected and considered when reading research.
As a layman this is why I have so little faith in medical practitioners. However, it should be noted that the sloppy science applies not only to medicine but to other areas of interest of daily life. It is almost a regression to belief in magic sometimes. X waved the wand once and it produced this, and this was funded by Y. Never a follow-up to X and his Y funded study. "It" become accepted protocal.
ReplyDeleteThanks and yes, sloppy science is everywhere, just ask any decent economist. But that doesn't mean that science is bad, it is just done badly. And interpreted badly. There is still good science in medicine, our priority should be to improve the quality and to be able to detect the bad science.
DeleteThanks and yes, sloppy science is everywhere, just ask any decent economist. But that doesn't mean that science is bad, it is just done badly. And interpreted badly. There is still good science in medicine, our priority should be to improve the quality and to be able to detect the bad science.
DeleteExcellent article! There was a recent NPR segment on the toll these sloppy studies take on patients in clinical trials. http://www.npr.org/blogs/health/2014/09/15/344084239/patients-vulnerable-when-cash-strapped-scientists-cut-corners
ReplyDeleteThanks for the link. I like the term they used: Wishful Science. I think that explains a lot of what we see in research and practice, where "wishful thinking" and science get a bit mixed up.
Delete