The idea that four placebo pills are more powerful than two sounds magical – because it isn’t true


Mike Hall
Mike Hall has been working in software development for over twenty years, the last ten of those as Technical Lead and CTO. He is the producer and host of the long-running podcast Skeptics with a K, part of the organising committee for the award winning skeptical conference QED, and the former president of the Merseyside Skeptics Society.

More from this author

In 2001, the New England Journal of Medicine published a study titled Is The Placebo Powerless? An Analysis of Clinical Trials Comparing Placebo with No Treatment.

This was an influential review, prompting follow-ups in 2004 and 2010, the latter of these being for the Cochrane Collaboration. Each of these papers draws comparable conclusions: there is little evidence placebos have powerful clinical effects, and such effects which are measured are difficult to distinguish from bias. Despite this, the narrative of the ‘powerful placebo effect’ persists.

One of the more common claims made for the strange and magical power of the placebo is that peptic ulcers heal faster when you take four placebo pills, rather than two. This is clearly an extraordinary claim. Four inert pills clear ulcers faster than two inert pills? Four times zero is greater than two times zero? Could this really be true?

The origin of these claims is a pair of papers by the medical anthropologist Daniel Moerman. The first, published in Medical Anthropology Quarterly in 1983, was titled General Medical Effectiveness and Human Biology: Placebo Effects in the Treatment of Ulcer Disease (“General Medical Effectiveness” apparently being Moerman’s preferred term for placebo effects, at least at the time).

In this analysis, Moerman searched the MEDLINE database for clinical trials published between 1977 and 1980 investigating the drug cimetidine, which is commonly prescribed for ulcers. Rather than looking at the effects of cimetidine itself, he focussed his attention on patients from the control groups. These patients had been given no active medication but many had their ulcers cured anyway, with some studies showing up to 90% recovery rate. Other studies showed geographical effects, with patient controls in studies conducted in Germany shown to be far more likely to recover than their counterparts in, say, Denmark.

Moerman’s characterisation of these observations, that “therefore placebos are ‘stronger’ in Germany” and “it is possible to heal ulcers in nearly all patients with inert treatment” arguably stray into the territory of rhetoric. He also betrays his underlying assumptions when he claims placebo effects are patients “respond[ing] to the form, not the content, of the treatment […] a biological response to a symbolic stimulus”. Nevertheless, there are legitimate questions to ask. Why do placebos seem to work better in Germany than Denmark? Why do controls in some studies show a 90% recovery rate, when others show 10%?

Unfortunately, the paper only very briefly entertains what I believe to be the most parsimonious explanation: some patients took some other medication and didn’t tell their doctors about it.

Unreported variables

six white tablets on a grey background

Moerman flirts with the idea that patients quietly taking antacids could account for the wide variation in results, before dismissing this because “there is even more evidence to suggest antacids are not effective”. However, these data were gathered before the widespread acceptance of H Pylori as responsible for almost all peptic ulcers. While patients may have been asked to refrain from taking antacids, or at least record their antacids use, it is unlikely this extended to avoiding antibiotics. In those studies with an apparently large placebo effect patients may have been inadvertently curing their ulcers while medicating for something else entirely. Of course, there is no way Moerman could have known this at the time, as Barry Marshall would not swallow his infamous bottle of H Pylori until the following year.

Moerman’s second prominent paper came sixteen years later. This time joined by several co-authors, “Placebo effect in the treatment of duodenal ulcer” was published in the British Journal of Pharmacology in December of 1999. In the introduction, the authors reference Moerman’s earlier work on peptic ulcers and placebo, and go on to ask if the frequency of placebo administration might influence the healing process.

To answer this question, they conducted a systematic review, searching the literature for double-blind, randomised, placebo-controlled trials on ulcer medication. The ulcers studied must be uncomplicated (no perforation or bleeding), they must be assessed by endoscope (a camera down the throat), and the endpoint of the study must be a ‘healed ulcer’. The treatment plan must be pills, and must be administered four times a day (each meal and again at bedtime), or twice a day (morning and bedtime).

The result was 79 papers, from which the authors promptly discarded the active treatment data, and looked only at the controls. They found that after four weeks 44% of patients had ‘healed ulcers’ after taking placebo four times a day, while just 36% of patients had ‘healed ulcers’ after taking placebo twice a day. They concluded:

We found a relation between frequency of placebo administration and healing of duodenal ulcer. We realize that the comparison was based on nonrandomized data. However, we speculate that the difference between regimens was induced by the difference in frequency of placebo administration.

What I find interesting about this conclusion is that they state they found a relationship, but don’t claim that it is causal. They acknowledge that the data is not randomised, and that this is a limitation. And they characterise as ‘speculation’ the notion that the difference was induced by varying the frequency of administration. This is far more tentative, far more careful language, than you might hear when this claim is repeated today.

Not only that, but in the discussion the authors offer a number of other possible explanations. They point out that the “two pill” trials were generally more modern, having been conducted from 1980 onward, where the “four pill” trials were mostly earlier. A secondary analysis, which excluded data before 1980, found no significant difference between the four pills and two pills regimens.

They also point out that the trials are subject to selection bias, as different clinicians for different studies were responsible for deciding if a patient fit with the study requirements, and that some (but not all) of the trials excluded patients who did not fully comply with the treatment plan.

However, the biggest source of bias in my opinion is one they don’t really mention, which relates to the definition of ’placebo effect’. This paper offers the following definition: 

any therapeutic procedure which has an effect on a patient, symptom, syndrome or disease, but which is objectively without specific activity for the condition being treated. 

I would argue this is too narrow a definition, especially for the data from this study. A better definition might be “any effect measured after an intervention, other than a response to an active treatment.”

Control groups, and accounting for bias

As I mentioned in my previous article for The Skeptic, the reason control groups exist is to capture as many uncontrolled and non-specific effects as possible – statistical regressions, learned responses, the natural course of the disease, etc – so they can be subtracted from the effects measured in the final analysis and we can confidently attribute what is left to the specific effect of the intervention.

Most significantly, this also includes bias. It includes patients telling doctors what they think you want to hear. It includes patients misremembering or misreporting their symptoms. It includes judgement calls made by doctors. And this is why our definition of “placebo effect” is important. The former definition – a therapeutic effect on the patient, syndrome, symptom or disease – does not allow for effects which exist only in the data. And yet judgement calls by clinicians can do that; they can change the data without changing the patient.

Let’s say you have two patients with peptic ulcers, and it’s the job of a pair of clinicians to perform an endoscopy to assess if that ulcer has healed. And let’s also say the condition of the ulcers between those two patients is absolutely identical, and that condition is: more-or-less healed.

Let’s say you have two patients with peptic ulcers, and it’s the job of a pair of clinicians to assess if that ulcer has healed… Even if their patients were in exactly the same condition, they could have recorded them differently based on their own opinions.

The first clinician looks at the first patient, and thinks “well that’s more-or-less healed” and puts them down on the “healed” tally. The second clinician looks at the second patient, who is in exactly the same condition, and says “no, not quite there” and puts them down in the “not healed” tally. The patients’ conditions are identical; but they are recorded differently in the data.

Within the context of a single trial, this isn’t so much of an issue. You will have the same researchers making the same (hopefully blinded) calls across the active treatment and placebo wings of the trial, so biases like this should cancel out.

But when you’re comparing the placebo wing of Trial A and the placebo wing of unrelated Trial B, the biases don’t cancel any more. Amongst other things, you’re now comparing the biases and judgement of the researchers who conducted Trial A, to the biases and judgement of the researchers who conducted Trial B. Even if their patients were in exactly the same condition, they could have recorded them differently based on their own opinions.

The difference is then ascribed by later authors to a powerful and magical effect from the inert placebo, and this claim becomes one of those things we just know about placebos. We know that four pills a day are better than two, and can confidently assert as much. Because of a flawed comparison of the biases of ulcer researchers some 40 years ago.

- Advertisement -spot_img

Latest articles

More like this