One of the key things the skeptical movement has wanted to encourage from the beginning is good science. One reason is that the quality of so much of the research skeptics encounter is so poor. The history of research into paranormal phenomena is a schoolroom for learning about cognitive bias and basic error, and even fraud. Science, we are all taught, is the best process we have for establishing the truth, but it is performed by scientists, who exhibit all the human failings common to any other profession.
It turns out that the same profound flaws are turning up in many other areas of science. At a recent meeting of cyber security researchers, there was a lot of discussion of the low statistical power of many studies. Sample sizes may be too small to show a meaningful effect; or the effect may be statistically small enough to make it unclear whether it will translate from the lab to the real world; or the sample may not be representative of the population that will use the results. For example: many of today’s cyber security studies, like many studies in social science, rely on samples drawn from student populations or supplied by the low-paid piece workers on Amazon’s Mechanical Turk. Neither is demographically similar to the people in an average office. If you’re testing how students learn, the fact that you test on students makes sense. If you’re testing how people respond to changes in password policies, the results you get from your sample may have no applicability to middle-aged workers in a law office in Dewsbury.
Well, you might say, cyber security is a young – immature, even – profession. Most of it is folk wisdom passed on from old-timers to newcomers, and “old-timers” are people who by and large entered the profession by accident 25 years ago because they were the only people in their company who weren’t afraid to read the manual to figure out how to set up a firewall. Maybe it’s not unreasonable that cyber security researchers don’t really know how to calculate what sample size and sampling method are the right ones, or how big an effect they need to see. Or even whether they’re asking the right question.
But … a few days later, I encountered this quote, in Angela Saini’s book Inferior, on the many errors science has made about women: “[The authors of an article in Nature Reviews Neuroscience] pointed out that ‘low statistical power’ was an ‘endemic problem’ in neuroscience.” The article in question, written in 2013, argued that scientists were being pressured to do bad research, “including using small samples of people or magnifying real effects, so they could seem to have sexy results.”
And if that weren’t enough, a day or two later I went on to read Rigor Mortis, a 2017 book on sloppy science. In it, Richard Harris, a science reporter for National Public Radio, examines the state of biomedical research … and finds a mess. With a much longer historical science base, biomedical researchers appear to be no better at selecting sample sizes or representative populations. New drugs have been tested solely on men, then prescribed for women. Treatments have been tested on mice and rats without any clear idea of whether the biology is similar enough to make the treatments viable for humans. And everywhere there is a reproducibility crisis. Sometimes this is because there isn’t enough information in the original report to recreate the experiment; sometimes it’s because the original study is just plain wrong. As long ago as 2005 the Public Library of Science concluded that most current published research findings are false.
There are all sorts of reasons why this is happening. A lot of them have to do with financial and career pressures. Academics are evaluated by their publishing records, and while we might prefer quality to be more important than quantity, numbers are easier for administrators to evaluate. Companies are eager to find products they can sell, and what’s a little cherry-picked data among friends? The result is that it’s increasingly hard for any of us to know what research we can trust.
The good news is that science’s best feature is that over time it’s self-correcting. The fact that these complaints are surfacing and that some researchers are giving their careers to fixing the problems in their fields shows science at its worst and best. The worst is that scientific rigour has been allowed to decline to this extent. The best is that we can rebuild on a sounder footing for the future. Unfortunately, there is no getting back the money, careers, and time that have been wasted following dead ends. In some of the parapsychology cases it was possible to find these flaws funny. But in other fields there is a real price for bad science and it’s paid in human lives.