Replication crisis

Creating new knowledge is the central driver of science. But we’re so obsessed with newness that we don’t spend enough time making sure we’re right.

This is an editorial for Issue 28 of Lateral by editor-in-chief Tessa Evans, who is hopeful that poking holes in issues within science won't stir up the anti-science brigade.

I hate to break it to you, but striking a power pose like a superhero doesn’t actually reduce your stress and increase your testosterone levels. Researchers attempting to recreate the study touting that particular piece of pop psych were unable to get the same result, calling the original study’s validity into question.

And it’s not the only research to get such treatment. The results of study after study are being questioned in what people are calling the replication crisis.

Science gains validity when evidence builds up, with replication serving as a built-in quality control. Results are replicable when you repeat a study to obtain fresh data and get the same results. This is not to be confused with results being reproducible, which is the case when you analyse the same data again and get the same results.

But something’s gone wrong with the way we set up the science system. Science has become so driven in the quest to generate new knowledge, through incentivising innovative ideas and valuing the ‘cutting edge’, that it’s forgotten that innovation alone isn’t enough.

There are several reasons for this. Funding doesn’t favour repetition, for example, and career advancement in science tends to depend on researchers solidly being able to deliver new results. But focusing solely on the new ignores that scientific evidence always contains uncertainty. Without repetition we can‘t refine that uncertainty and actually progress science in a meaningful way.

The replication crisis plays out in different ways in different areas of science. One example is in organic chemistry, a field dominated by synthesising new molecules or finding new routes to make them. Due to lack of replication (or because no-one publishes failed attempts), researchers in the field often struggle to successfully recreate a chemical reaction based on someone else’s method, wasting both time and materials.

As an ex-chemist, I can tell you this happens repeatedly. But one journal, Organic Syntheses, has invested time and money into preventing this by independently verifying all experiments submitted for publication since it launched in 1921 — rejecting around 7.5% of papers due to this process every year.

While verifying a chemical reaction is fairly straightforward, with most chemistry labs worldwide containing the same equipment, biology or physics experiments often require specialised or costly equipment that make replication much more difficult, or even impossible, to do at the peer review stage.

And, in chemistry there are often workarounds — though it might take an additional six steps to get the molecule you want. In paleoarchaeology, the stakes are much higher as lack of replication can lead to the misidentification of species. Last year, Tasmania lost its very own species of extinct penguin, when ancient DNA techniques discovered that the only specimen of this rare bird actually contained bones of three different penguin species, two of which weren’t even Australian. Replication can occur in the natural sciences, but often only when technology becomes cheaper and more accessible.

Nowhere is the replication crisis more pressing than in psychology. People are fickle subjects and there’s no more divisive a topic to us than ideas about how we think. In 2015, a group of 270 psychology researchers who were worried about the integrity of their field attempted to replicate 100 psychological studies. They were only able to replicate 39, and within these, most of the reported effects halved in size compared to the original studies. In the study, they note: “A large portion of replications produced weaker evidence for the original findings despite using materials provided by the original authors.”

Since then, the spotlight has been on psychology, with attempted replication studies ‘debunking’ or exposing a number of studies whose results could not be repeated. Reasons for replication failure vary from cherry picking statistics to false positive results (finding a result when there wasn’t one). Cornerstone study after cornerstone study have come tumbling down in the face of renewed scrutiny.

Stanford now run a site called Best Practises in Science that lists replication failures and what both the original and replicated studies found. Among them is the infamous Stanford Prison Experiment. The most high-profile study on the list, the experiment assigned paid participants to act as inmates or guards in a mock prison at Stanford University in 1973. The experiment was famously called off early because of the brutality the guards inflicted on the inmates.

The BBC conducted a replication of the study in 2002 and found almost the opposite result; the prisoners ended up overpowering the guards. The BBC argues that the outcome of the Stanford Prison Experiment was not produced by the prison setting but by the expectations set by researcher and Prison Superintendent, Philip Zimbardo.

The BBC study was run in 2002, and the results have since been published in scientific journals and textbooks. But the Stanford Prison Experiment seems embedded in our cultural lexicon — I remember learning about it undergraduate psychology almost a decade ago, well after the BBC study. And people are still interested now; this year, Medium published recordings of Zimbardo that supply strong evidence that the guards in the experiment were coached to be cruel.

On the other hand, a replication of The Milgram experiment (an obedience test of a similar generational ilk to the Stanford Prison Experiment) showed that the results did in fact hold up to scrutiny, indicating that we do feel less responsibility for our actions when following orders.

All this chatter about the issue is filtering through to scientists in all disciplines. In 2016, the journal Nature surveyed more than 1500 scientists and found that over two-thirds of researchers have tried and failed to reproduce another scientist's experiments, and half have failed to reproduce their own experiments. Far from a mortal blow to the respectability of science, two-thirds didn’t think failure to reproduce a study meant that the result of the study was wrong, with most saying they still trust the published literature.

Australia is also starting to highlight these questionable research practices. A survey of 800 Australian ecologists and evolutionary biologists by researchers from the University of Melbourne found two-thirds reported they had at least once failed to report results because they were not statistically significant (cherry picking) and over a third had collected more data after inspecting whether results were statistically significant.

It might seem like this scientific navel-gazing isn’t all that important — it’s probably not too disappointing that the posing like a superhero won’t save your day. But studies like the Stanford Prison Experiment that claim to explain control dynamics between the powerful and their subordinates can have lasting effects on policy and how we interact with others.

However, things are looking up, with people designing resources to address this very issue. A group from the University of Guelph, Canada has used the replication crisis to create a teaching guideline around questionable research practises for postgraduate students to ”steer students away from common practices that are ill-advised”, in an attempt to cut the issue off at the point where psychology research starts.

Personally, I’m glad that science is starting to take a long, hard look at itself. Being critical of your own work is a key part of being a scientist — and a quality that will only make it more robust. Culture change won’t occur overnight, but shining a light on these issues will hopefully challenge some bad habits that have crept in while we’ve been rushing to keep up within the troubled science system.

For all that talk of replication, I hope you won’t find we’re repeating ourselves as we delve into all things imitation. You’ll find out about the doomed alchemical quest of turning things into gold, how meat can be grown in a lab, how cells mimic death, a heartfelt defence of imitation in a world of innovation-lovers, the complex chemistry of manufacturing aromas, and an exploration of the issues surrounding copycat suicides.