Let's Talk Books And Politics: Medical Science and the Vanishing Truth

Jonah Lehrer has produced a fascinating and deeply troubling article in “The New Yorker:” The Truth Wears Off. He reports on a bizarre observation from the field of experimental science. It appears that experimental results that were considered unassailable are being observed to fail the test of replicability. When an interesting result is published, its validity should not be assumed until the results have been reproduced by another team. It is common in the physical sciences for one researcher’s success to goad on others to collect a little of the glory, either by verifying the result or, even better, proving the result to be wrong. This approach works quite well in the physical sciences where inanimate objects are generally being studied and most of the funding comes from the government.

Tests of replicability are less common in the medical sciences where the experiments can be more difficult to control and can be quite costly. There is also the fact that many studies are financed and carried out by people with a vested financial interest in the results. This system is fraught with opportunities for abuse. The inaccuracies associated with medical test results were discussed previously in a post titled Lies, Damned Lies, and Medical Science.

Lehrer details the mechanisms by which inaccuracies can be introduced in all sciences, drawing his examples from the collection available in the medical arena.

“On September 18, 2007, a few dozen neuroscientists, psychiatrists, and drug-company executives gathered in a hotel conference room in Brussels to hear some startling news. It had to do with a class of drugs known as atypical or second-generation antipsychotics, which came on the market in the early nineties. The drugs, sold under brand names such as Abilify, Seroquel, and Zyprexa, had been tested on schizophrenics in several large clinical trials, all of which had demonstrated a dramatic decrease in the subjects’ psychiatric symptoms. As a result, second-generation antipsychotics had become one of the fastest-growing and most profitable pharmaceutical classes. By 2001, Eli Lilly’s Zyprexa was generating more revenue than Prozac. It remains the company’s top-selling drug.”

“But the data presented at the Brussels meeting made it clear that something strange was happening: the therapeutic power of the drugs appeared to be steadily waning. A recent study showed an effect that was less than half of that documented in the first trials, in the early nineteen-nineties. Many researchers began to argue that the expensive pharmaceuticals weren’t any better than first-generation antipsychotics, which have been in use since the fifties. ‘In fact, sometimes they now look even worse,’ John Davis, a professor of psychiatry at the University of Illinois at Chicago, told me.”

“But now all sorts of well-established, multiply confirmed findings have started to look increasingly uncertain. It’s as if our facts were losing their truth: claims that have been enshrined in textbooks are suddenly unprovable. This phenomenon doesn’t yet have an official name, but it’s occurring across a wide range of fields, from psychology to ecology. In the field of medicine, the phenomenon seems extremely widespread, affecting not only antipsychotics but also therapies ranging from cardiac stents to Vitamin E and antidepressants: Davis has a forthcoming analysis demonstrating that the efficacy of antidepressants has gone down as much as threefold in recent decades.”

Lehrer attributes this startling trend not to some mysterious physical phenomenon, but three rather mundane facts about humans and nature. The first he describes as publication bias.

“In 2001, Michael Jennions, a biologist at the Australian National University, set out to analyze “temporal trends” across a wide range of subjects in ecology and evolutionary biology. He looked at hundreds of papers and forty-four meta-analyses (that is, statistical syntheses of related studies), and discovered a consistent decline effect over time, as many of the theories seemed to fade into irrelevance.”

“Jennions, similarly, argues that the decline effect is largely a product of publication bias, or the tendency of scientists and scientific journals to prefer positive data over null results, which is what happens when no effect is found. The bias was first identified by the statistician Theodore Sterling, in 1959, after he noticed that ninety-seven per cent of all published psychological studies with statistically significant data found the effect they were looking for.... Sterling saw that if ninety-seven per cent of psychology studies were proving their hypotheses, either psychologists were extraordinarily lucky or they published only the outcomes of successful experiments. In recent years, publication bias has mostly been seen as a problem for clinical trials, since pharmaceutical companies are less interested in publishing results that aren’t favorable. But it’s becoming increasingly clear that publication bias also produces major distortions in fields without large corporate incentives, such as psychology and ecology.”

Here is another way of looking at this dynamic.

“...after a new paradigm is proposed, the peer-review process is tilted toward positive results. But then, after a few years, the academic incentives shift—the paradigm has become entrenched—so that the most notable results are now those that disprove the theory. “

The second contribution to the phenomenon is even more troubling: selective reporting. This refers to the fact that scientists, given a large mass of data, will preferentially select data that agrees with their preconceptions. This is often done subconsciously or inadvertently, but it happens.

“....selective reporting is not the same as scientific fraud. Rather, the problem seems to be one of subtle omissions and unconscious misperceptions, as researchers struggle to make sense of their results. Stephen Jay Gould referred to this as the “shoehorning” process.”

“One of the classic examples of selective reporting concerns the testing of acupuncture in different countries. While acupuncture is widely accepted as a medical treatment in various Asian countries, its use is much more contested in the West. These cultural differences have profoundly influenced the results of clinical trials. Between 1966 and 1995, there were forty-seven studies of acupuncture in China, Taiwan, and Japan, and every single trial concluded that acupuncture was an effective treatment. During the same period, there were ninety-four clinical trials of acupuncture in the United States, Sweden, and the U.K., and only fifty-six per cent of these studies found any therapeutic benefits. As Palmer notes, this wide discrepancy suggests that scientists find ways to confirm their preferred hypothesis, disregarding what they don’t want to see. Our beliefs are a form of blindness.”

The author then brings back the results of the redoubtable Dr. Ioannidis

“John Ioannidis, an epidemiologist at Stanford University, argues that such distortions are a serious issue in biomedical research. “These exaggerations are why the decline has become so common,” he says. “It’d be really great if the initial studies gave us an accurate summary of things. But they don’t. And so what happens is we waste a lot of money treating millions of patients and doing lots of follow-up studies on other themes based on results that are misleading.” In 2005, Ioannidis published an article in the Journal of the American Medical Association that looked at the forty-nine most cited clinical-research studies in three major medical journals. Forty-five of these studies reported positive results, suggesting that the intervention being tested was effective. Because most of these studies were randomized controlled trials—the “gold standard” of medical evidence—they tended to have a significant impact on clinical practice, and led to the spread of treatments such as hormone replacement therapy for menopausal women and daily low-dose aspirin to prevent heart attacks and strokes. Nevertheless, the data Ioannidis found were disturbing: of the thirty-four claims that had been subject to replication, forty-one per cent had either been directly contradicted or had their effect sizes significantly downgraded.”

Lehrer has one more factor to discuss. He points out that many scientific results can be attributed to mere noise. By noise he means a combination of statistical excursion and effects of unknown and uncontrolled variables. Anyone who has ever tried to use Monte Carlo methods to describe a physical system will be well aware of the fickleness of statistics. The existence of unknown unknowns is why scientists insist on independent replication of results.

Lehrer provides a quote from a biologist named Palmer that will serve as an appropriate summary of this discussion.

“We cannot escape the troubling conclusion that some—perhaps many—cherished generalities are at best exaggerated in their biological significance and at worst a collective illusion nurtured by strong a-priori beliefs often repeated.”

Unfortunately, when dealing with medical science, billions of dollars and the health of millions of people are involved.