“What gets us into trouble is not what we don’t know. It’s what we know for sure that just ain’t so.”

― Mark Twain

The push for education to be research informed continues, and with good reason. The history of medicine is littered with folk wisdom, from miasma to trepanning, three divine universal forces to four humours to five elements, from phrenology to, well, you get the idea. The rise of the scientific method altered the course of Western medicine for the better.

Unfortunately, the same cannot be said for teaching. In just the past few decades we have lived through ideas like learning styles, de Bono’s thinking hats, Bloom’s taxonomy, misguided notions of differentiation, and a litany of other approaches resulting in little to no positive impact on outcomes. So surely the recent rise of researchEd and the like is a good thing, yes?

The honest answer is “maybe”.

The prompting for this blog were some reflections on evidence followed by a couple of quick Twitter polls.

In one, some 78% of respondents agree/strongly agree that their teaching practice is guided by research/evidence. Most of my followers are working in the English education system, and perhaps the current Ofsted Evidence Inspection Framework has contributed to that – it is, after all, based on substantial evidence.

In the other, 85% of respondents admitting to having never heard of the replication crisis in scientific research. If you’re reading this as one of the 12% claiming at least a basic understanding of the idea (one would hope this includes all ELEs and Research Leads), you can perhaps now start to see a problem emerge.

Science, as a pursuit under which all research and evidence falls, is no longer the self correcting process it is imagined to be. Most of us will have some understanding of the basic scientific method – hypothesise, test, analyse, peer review – indeed, it is taught as part of GCSE science courses. The issue is, it doesn’t quite work as it’s supposed to anymore.

The “replication crisis”, a term that emerged in the early 2010s, is so called because a worryingly large amount of scientific research fails to replicate, or replication is never attempted in the first place. Both of these should ring alarm bells. Top journal Nature polled over 1500 scientists, with 70% reporting that they’d tried and failed to replicate another groups experiments, and just over half had failed to replicate their own experiments (ref). The failure to replicate is most prevalent in the field of psychology and social sciences (because humans), under which educational research falls. For example, in 2015, researchers writing in Science published their results of attempts to replicate 100 experiments published in top psychology journals in 2008 – just 39% of the original results were replicated.

How has this happened? A variety of reasons that I won’t explore here, but if you are interested you should start by looking up the following terms – publication bias, selection bias, reporting bias (aka selective reporting), p-hacking and the “publish or perish” approach to academic work (that is, that academic success is built upon publication, leading to the proliferation of novel studies and low quality journals, at the expense of replication). Extensive self referencing is another. All of this is before we approach the idea that the peer review system is broken (unpaid, unrecognised work causing a shortage of reviewers, opening the door to “recommendation” cronyism) and activist politics having infiltrated academia (e.g.). The Mertonian norms of universalism, communalism, disinterestedness and organised skepticism seem alien to many aspects of modern research.

This is a problem for education in a number of ways. Many educational research experiments are based on small sample sizes. Were a sample size is small, the effects of false positives are amplified. When replicated at a larger scale, the effects are often vastly reduced, if they are present at all. The second problem here is that they are rarely replicated at all. Makel and Plucker (2014) studied 100 top education journals and found just 0.13% were replication studies. That’s not a typo. The replication of educational research rounds to 0%. Of those that were replications, around half of these were replications by the original researchers, resulting in 89% reproducibility. However, when a different research team attempted replication, they were successful just 54% of the time – barely more than a coin toss. Replication is supposed to be the “cornerstone of science”, a fundamental concept in the notion of ‘self-correcting’, but at present it isn’t even an afterthought. Virtually no educational research is replicated, or in other words, is of limited value in and of itself. Popular books like “77 Studies…” hardly help matters. With educational research in particular, there is the compounding issue of ill-defined terms. In psychology there are, at least, generally agreed definitions and measures – the same cannot be said for education. What do we know for sure – “the evidence shows…” – that just isn’t so? Dylan Wiliam uses the brilliant phrase “lethal mutations”, but what if the effect was simply never real to begin with? What if our interpretation of a real effect renders it useless? What use to us, for example, is growth mindset, if only Dweck seems capable of replicating the effect? School time is a finite resource, we owe it to our students to use it wisely.

The quality of ‘evidence’ is a concept that only seems to be growing in importance. How many of us have sat through a CPD, whether in-house or otherwise, and been presented with “the evidence” as justification? What evidence? Who are the authors? Where was it published? What was the methodology? Cite the reference and give time to review it, because not all “evidence” is created equally. We cannot blindly adhere to anything preceded by “the evidence shows”. If you go back 15-20 years, there is published research extolling the virtues of discovery learning, etc. We need to look beyond the abstract that appears to favour our desired outcome, though paywalls can be restrictive, and interrogate the methodology. Whole school policy decisions should not be made on the basis of a single research paper in which n=28. But neither is a single study carried out under lab conditions with hundreds of isolated university age subjects particularly useful. We need to consider the conditions under which the evidence was obtained, and whether that is reasonable to apply to the conditions in which we are working. The classroom is a complex social environment, full of fallible and irrational human beings (including us); what works best overall might not work best on a Friday afternoon at the end of a seven week half term.

But despite all this, this blog is not a call to disregard education research. Quite the opposite. Scientific evidence is a solid foundation on which to base school improvement and teacher development. We ought to be more invested in educational research, capable and willing to identify a poor methodology or doubtful conclusions. It is a call for us to search out and prioritise robust, replicated studies. Rather than asking “where’s the evidence?”, we should ask “where’s the evidence, how was it obtained, and has it been replicated? Is the conclusion valid from the evidence obtained?” A good place to start might be the Special Issue of Educational Research and Evaluation, due to be published later this year, an outlet for replication work. Educational research is leaps and bounds ahead of where it was in the past, but it still has a long way to go to become a ‘legitimate discipline’ in the eyes of many. This is an inescapable reality we need to keep in mind.

Some further reading/viewing here.