Thursday, 18 September 2014

The replication problem

One of the fundamental principles of science is that the results of any experiment should be reproducible. Reproducibility is essential because it means that the results can be relied upon, as they are more likely to be true. Unfortunately, there is little fame in replicating someone else’s study; it is also hard to get such studies funded (because they are not ‘novel’). Consequently, many studies are not repeated and many findings stand alone without verification from separate, independent researchers. This is a problem because often when studies are replicated, they fail to reproduce the original findings.
What am I talking about?
To get the replication/reproduction terminology clear from the start, I will refer to replicating or repeating studies (doing the same research again, preferably independently) and whether or not the replicated study reproduces the same results.

Two problems
From the definitions above, we have two problems: firstly studies are often not being replicated and secondly, when they are being replicated, they often fail to reproduce the results of the initial study. Follow me so far?

The good and the bad
  1. Reproduction of previous results is a good thing: it is a verification of the findings of previous research, therefore increasing the probability that those findings are true.
  2. Failure of the replication study to reproduce the original study results decreases the probability that the original findings were true. This is bad for the original researchers, but it is good for everyone else because it is science’s way of detecting errors – of self-correcting.
  3. Failure to replicate studies (at all) is bad for everyone because it means that we are less certain that the original results are true, and we could be holding onto to these false beliefs for a very long time, screwing up our practice of medicine.

According to Popper, it is a basic tenet of science that any finding / statement / theory must be falsifiable. If something cannot be disproved then, in effect, it cannot be challenged and becomes dogma, not science. A study or theory that stands up to attempts to falsify it is a more robust one.

Replicating studies goes hand in hand with falsifiability. If nobody is going to repeat a study (and attempt to falsify it) then it doesn’t matter if it is falsifiable or not. Being falsifiable is not enough: theories gain strength from standing up to attempts at falsification, and study findings need to be reproduced if they are to be relied upon.

If they can’t be reproduced, then the conclusions of a study are not only not right, they are not even wrong (Pauli).

An article in Nature from 2012 reported an attempt by researchers to replicate 53 landmark or breakthrough papers in the field of pre-clinical cancer research. Despite multiple attempts at reproducing the findings in their own lab, they could only reproduce the findings of 6 out of the 53 studies. They even went to the extent of contacting original authors and repeating the studies in slightly different ways.

In one case, when told that despite replicating his original research 50 times, that the original findings were not able to be reproduced, the author of the original paper said that he had done it 6 times himself, and only produced the interesting findings once. That means that the original researchers could not even reproduce their own findings in their own labs. Yet they only reported the positive study, not the negative studies. Depending on how you they did it, that is either reporting bias or publication bias, but either way it is biased.

Worryingly, several of the non-reproducible studies had already spawned new fields of research that were founded on the original findings, yet never validated them through re-testing.

Lamenting the declining success rate of clinical trials and a widening gap between discoveries that were ‘interesting’ and those that were ‘feasible’, a team from Bayer replicated 67 published in-house projects (link here) and could only reproduce the original findings in about 20-25%.

Both these papers blame sloppy science by the researchers, and the academic system (which rewards interesting findings).

A 2005 review by Ioannidis looked at the biggest articles (highest number of citations) in three major general medical journals and found that most of them were either not replicated (24%) or were replicated, but the results were not reproduced (no effect - 16%, or lesser effect - 16%).

In a recent review of medical publications testing standards of care, it was found that such studies were more likely to refute the original findings than reaffirm them. Interestingly, this reaffirms Ioannidis’ findings.

Psychology is particularly hard hit by lack of reproducibility, which has sparked such endeavours as the Reproducibility Project (here). This is part of a larger validation project looking to replicate and validate scientific findings (here).

Why is there a lack of repeat studies?
There is a perception that only novel science is rewarding. This may be true, as novel science is rewarded by grant money, promotions, doctorates and fame. Who wants to do a study that someone else has already done, and do it exactly the same way? Well, I do. That is the only way I can find out if those interesting studies that pop up every now and then are true.

The solution
  1. Firstly, scientists need to be more scientific; a lot of this is sloppy science. This is covered in a previous post on Manufactured Significance. The methodology needs to be better, and studies need to have explicit protocols published prior to the research starting.
  2. Secondly, there needs to be more journals like PLOS, who publish anything they get, completely and open access, all on line, based purely on scientific merit not newsworthiness or novelty.
  3. Thirdly, funders (grant funders, industry, universities, governments) need to appropriately prioritise replication research, particularly of important findings that have significant clinical and resource implications.
  4. Fourthly, the readers of the research need to be aware of the problem, and they need to equip themselves with the tools for determining scientific validity and start using those tools.

The bottom line
For many reasons, there is a lack reproduction of scientific findings. Firstly because studies are not being repeated, and secondly because when they are, the results are often not reproduced. This knowledge gap is being recognised, but it will only be filled when those who control research output (the scientists themselves and those who control financial and academic rewards) appreciate the importance of repeating other people’s studies.

Other links


  1. What about too many reproducibility studies that lead to research waste? There are times when we do not need any more repeated studies. It most often occurs when people are trying to prove that something works when the previous studies have shown they don't. In this instance it is important to consider whether a repeated study is worthwhile and what it would add. One can look to systematic reviews and meta-analysis to determine how large a study would need to be to overturn the results of previous studies.

  2. Excellent point. Most of the failure to reproduce/replicate is for studies with positive/surprising results, like antibiotics curing back pain. I think those studies HAVE to be replicated. Also, in my own field, I find that 'negative' research is not taken up or accepted, and therefore a replication study could strengthen the evidence base and be more likely to effect a change in practice.
    An example of what you are saying would be PRP injections, where the evidence is overwhelmingly negative, but people keep doing more research, and occasionally studies come up positive. I guess I have less of a problem with that (part of my No-such-thing-as-bad-data philosophy) except, as you say, that it is a waste of resources.

  3. Only 16% of highly cited articles were contradicted later, as reported in the study by Ioannidis. Is that surprising, given the variability in populations, selection criteria, outcome measurements, etc that occur in clinical studies, even those which are considered identical by Ioannidis? It's hardly news. I'm surprised it's not more. One of the things about looking at these contradicting studies is often examining the subtle differences between these "identical" studies as way of determining if these differences are responsible for the outcomes. I often find it's more valuable looking at contradicting studies this way, rather than throwing up one's arms in the air and declaring that all research is flawed.

    1. Thanks John and I agree: exploring why similar studies show different results can be very revealing. The Nature studies were a little different as they tried to replicate lab studies precisely and it appears that, despite a public perception that lab research is somehow more 'pure', it appears that lab research is often a lot less rigorous than clinical research (less blinding, randomisation etc.), possibly because of the high methodological and ethical standards now required of clinical research. See my recent (June) post for a similar comment on animal research.
      And I definitely don't want to give the impression that all research should be disregarded, only that a lot of it is biased, and that bias in research needs to be reduced (which, I believe it is, slowly) and to be detected and considered when reading research.

  4. As a layman this is why I have so little faith in medical practitioners. However, it should be noted that the sloppy science applies not only to medicine but to other areas of interest of daily life. It is almost a regression to belief in magic sometimes. X waved the wand once and it produced this, and this was funded by Y. Never a follow-up to X and his Y funded study. "It" become accepted protocal.

    1. Thanks and yes, sloppy science is everywhere, just ask any decent economist. But that doesn't mean that science is bad, it is just done badly. And interpreted badly. There is still good science in medicine, our priority should be to improve the quality and to be able to detect the bad science.

    2. Thanks and yes, sloppy science is everywhere, just ask any decent economist. But that doesn't mean that science is bad, it is just done badly. And interpreted badly. There is still good science in medicine, our priority should be to improve the quality and to be able to detect the bad science.

  5. Excellent article! There was a recent NPR segment on the toll these sloppy studies take on patients in clinical trials.

    1. Thanks for the link. I like the term they used: Wishful Science. I think that explains a lot of what we see in research and practice, where "wishful thinking" and science get a bit mixed up.