UNDERSTANDING RESEARCH: What do we actually mean by research and how does it help inform our understanding of things? Today we look at the dangers of making a link between unrelated results.
Here’s an historical tidbit you may not be aware of. Between the years 1860 and 1940, as the number of Methodist ministers living in New England increased, so too did the amount of Cuban rum imported into Boston – and they both increased in an extremely similar way. Thus, Methodist ministers must have bought up lots of rum in that time period!
Actually no, that’s a silly conclusion to draw. What’s really going on is that both quantities – Methodist ministers and Cuban rum – were driven upwards by other factors, such as population growth.
In reaching that incorrect conclusion, we’ve made the far-too-common mistake of confusing correlation with causation.
What’s the difference?
Two quantities are said to be correlated if both increase and decrease together (“positively correlated”), or if one increases when the other decreases and vice-versa (“negatively correlated”).
Correlation is readily detected through statistical measurements of the Pearson’s correlation coefficient, which indicates how tightly locked together the two quantities are, ranging from -1 (perfectly negatively correlated) through 0 (not at all correlated) and up to 1 (perfectly positively correlated).
But just because two quantities are correlated does not necessarily mean that one is directly causing the other to change. Correlation does not imply causation, just like cloudy weather does not imply rainfall, even though the reverse is true.
If two quantities are correlated then there might well be a genuine cause-and-effect relationship (such as rainfall levels and umbrella sales), but maybe other variables are driving both (such as pirate numbers and global warming), or perhaps it’s just coincidence (such as US cheese consumption and strangulations-by-bedsheet).
Even where causation is present, we must be careful not to mix up the cause with the effect, or else we might conclude, for example, that an increased use of heaters causes colder weather.
In order to establish cause-and-effect, we need to go beyond the statistics and look for separate evidence (of a scientific or historical nature) and logical reasoning. Correlation may prompt us to go looking for such evidence in the first place, but it is by no means a proof in its own right.
Although the above examples were obviously silly, correlation is very often mistaken for causation in ways that are not immediately obvious in the real world. When reading and interpreting statistics, one must take great care to understand exactly what the data and its statistics are implying – and more importantly, what they are not implying.
One recent example of the need for caution in interpreting data is the excitement earlier this year surrounding the apparent groundbreaking detection of gravitational waves – an announcement that appears to have been made prematurely, before all the variables that were affecting the data were accounted for.
Unfortunately, analysing statistics, probabilities and risks is not a skill set wired into our human intuition, and so is all too easy to be led astray. Entire books have been written on the subtle ways in which statistics can be misinterpreted (or used to mislead). To help keep your guard up, here are some common slippery statistical problems that you should be aware of:
1) The Healthy Worker Effect, where sometimes two groups cannot be directly compared on a level playing field.
Consider a hypothetical study comparing the health of a group of office-workers with the health of a group of astronauts. If the study shows no significant difference between the two – no correlation between healthiness and working environment – are we to conclude that living and working in space carries no long-term health risks for astronauts?
No! The groups are not on the same footing: the astronaut corps screen applicants to find healthy candidates, who then maintain a comprehensive fitness regime in order to proactively combat the effects of living in “microgravity”.
We would therefore expect them to be significant healthier than office workers, on average, and should rightly be concerned if they were not.
2) Categorisation and the Stage Migration Effect – shuffling people between groups can have dramatic effects on statistical outcomes.
This is also known as the Will Rogers effect, after the US comedian who reportedly quipped:
When the Okies left Oklahoma and moved to California, they raised the average intelligence level in both states.
To illustrate, imagine dividing a large group of friends into a “short” group and a “tall” group (perhaps in order to arrange them for a photo). Having done so, it’s surprisingly easy to raise the average height of both groups at once.
Simply ask the shortest person in the “tall” group to switch over to the “short” group. The “tall”‘ group lose their shortest member, thus bumping up their average height – but the “short” group gain their tallest member yet, and thus also gain in average height.
This has major implications in medical studies, where patients are often sorted into “healthy” or “unhealthy” groups in the course of testing a new treatment. If diagnostic methods improve, some very-slightly-unhealthy patients may be recategorised – leading to the health outcomes of both groups improving, regardless of how effective (or not) the treatment is.
3) Data mining – when an abundance of data is present, bits and pieces can be cherry-picked to support any desired conclusion.
This is bad statistical practice, but if done deliberately can be hard to spot without knowledge of the original, complete data set.
Consider the above graph showing two interpretations of global warming data, for instance. Or fluoride – in small amounts it is one of the most effective preventative medicines in history, but the positive effect disappears entirely if one only ever considers toxic quantities of fluoride.
For similar reasons, it is important that the procedures for a given statistical experiment are fixed in place before the experiment begins and then remain unchanged until the experiment ends.
4) Clustering – which is to be expected even in completely random data.
Consider a medical study examining how a particular disease, such as cancer or Multiple sclerosis, is geographically distributed. If the disease strikes at random (and the environment has no effect) we would expect to see numerous clusters of patients as a matter of course. If patients are spread out perfectly evenly, the distribution would be most un-random indeed!
So the presence of a single cluster, or a number of small clusters of cases, is entirely normal. Sophisticated statistical methods are needed to determine just how much clustering is required to deduce that something in that area might be causing the illness.
Unfortunately, any cluster at all – even a non-significant one – makes for an easy (and at first glance, compelling) news headline.
Statistical analysis, like any other powerful tool, must be used very carefully – and in particular, one must always be careful when drawing conclusions based on the fact that two quantities are correlated.
Instead, we must always insist on separate evidence to argue for cause-and-effect – and that evidence will not come in the form of a single statistical number.
Seemingly compelling correlations, say between given genes and schizophrenia or between a high fat diet and heart disease, may turn out to be based on very dubious methodology.
We are perhaps as a species cognitively ill prepared to deal with these issues. As Canadian educator Kieran Egan put it in his book Getting it Wrong from the Beginning:
The bad news is that our evolution equipped us to live in small, stable, hunter-gatherer societies. We are Pleistocene people, but our languaged brains have created massive, multicultural, technologically sophisticated and rapidly changing societies for us to live in.
In consequence, we must constantly resist the temptation to see meaning in chance and to confuse correlation and causation.
This article is part of a series on Understanding Research.
Why research beats anecdote in our search for knowledge
Where’s the proof in science? There is none
Positives in negative results: when finding ‘nothing’ means something
The risks of blowing your own trumpet too soon on research
How to find the knowns and unknowns in any research
How myths and tabloids feed on anomalies in science
The 10 stuff-ups we all make when interpreting research
Journalists are constantly being reminded that “correlation doesn’t imply causation;” yet, conflating the two remains one of the most common errors in news reporting on scientific and health-related studies. In theory, these are easy to distinguish—an action or occurrence can cause another (such as smoking causes lung cancer), or it can correlate with another (such as smoking is correlated with high alcohol consumption). If one action causes another, then they are most certainly correlated. But just because two things occur together does not mean that one caused the other, even if it seems to make sense.
Unfortunately, intuition can lead us astray when it comes to distinguishing between the two. For example, eating breakfast has long been correlated with success in school for elementary school children. It would be easy to conclude that eating breakfast causes students to be better learners. Is this a causal relationship—does breakfast by itself create better students? Or is it only a correlation: perhaps not having breakfast correlates highly with other challenges in kids’ lives that make them poorer students, such as less educated parents, worse socio-economic status, less focus on school at home, and lower expectations.
It turns out that kids who don’t eat breakfast are also more likely to be absent or tardy—and absenteeism plays a significant role in their poor performance. This may lead one to believe that there is not a causal relationship. Yet breakfast may encourage kids to come to school (and on-time), which then improves their performance in school, and so perhaps encourages attendance, which then results in better performance. In a recent literature review, there were mixed results suggesting that the advantages of breakfast depend on the population, the type of breakfast provided, and the measurement of “benefit” for the kids. Breakfast seems to have an overall positive impact on cognitive performance, especially memory tasks and focus. Not surprisingly, the benefit seems greater for kids who are undernourished. But the clear message here is that a causal relationship has been extremely hard to establish, and remains in question.
Many studies are designed to test a correlation, but cannot possibly lead us to a causal conclusion; and yet, obvious “reasons” for the correlation abound, tempting us toward a potentially incorrect conclusion. People learn of a study showing that “girls who watch soap operas are more likely to have eating disorders”— a correlation between soap opera watching and eating disorders—but then they incorrectly conclude that watching soap operas gives girls eating disorders. It is entirely possible that girls who are prone to eating disorders are also attracted to soap operas.
There are several reasons why common sense conclusions about cause and effect might be wrong. Correlated occurrences may be due to a common cause. For example, the fact that red hair is correlated with blue eyes stems from a common genetic specification that codes for both. A correlation may also be observed when there is causality behind it—for example, it is well established that cigarette smoking not only correlates with lung cancer but actually causes it. But in order to establish cause, we have to rule out the possibility that smokers are more likely to live in urban areas, where there is more pollution—and any other possible explanation for the observed correlation.
In many cases, it seems obvious that one action causes another; however, there are also many cases when it is not so clear (except perhaps to the already-convinced observer). In the case of soap-opera watching anorexics, we can neither exclude nor embrace the hypothesis that the television is a cause of the problem—additional research would be needed to make a convincing argument for causality. Another hypothesis might be that girls inclined to suffer poor body image are drawn to soap operas on television because it satisfies some need related to their poor body image. Or it could be that neither causes the other, but rather there is a common trait—say, an overemphasis on appearance in the girls’ environment—that causes both an interest in soap operas and an inclination to develop eating disorders. None of these hypotheses are tested in a study that simply asks who is watching soaps and who is developing eating disorders, and finding a correlation between the two.
How, then, does one ever establish causality? This is one of the most daunting challenges of public health professionals and pharmaceutical companies. The most effective way of doing this is through a controlled study. In a controlled study, two groups of people who are comparable in almost every way are given two different sets of experiences (such one group watching soap operas and the other game shows), and the outcome is compared. If the two groups have substantially different outcomes, then the different experiences may have caused the different outcome.
There are obvious ethical limits to controlled studies: it would be problematic to take two comparable groups and make one smoke while denying cigarettes to the other in order to see if cigarette smoking really causes lung cancer. This is why epidemiological (or observational) studies are so important. These are studies in which large groups of people are followed over time, and their behavior and outcome is also observed. In these studies, it is extremely difficult (though sometimes still possible) to tease out cause and effect, versus a mere correlation.
Typically, one can only establish a causal relationship if the effects are extremely notable and there is no reasonable explanation that challenges causality. This was the case with cigarette smoking, for example. At the time that scientists, industry trade groups, activists and individuals were debating whether the observed correlation between heavy cigarette smoking and lung cancer was causal or not, many other hypotheses were considered (such as sleep deprivation or excessive drinking) and each one dismissed as insufficiently describing the data. It is now a widespread belief among scientists and health professionals that smoking does indeed cause lung cancer.
When the stakes are high, people are much more likely to jump to causal conclusions. This seems to be doubly true when it comes to public suspicion about chemicals and environmental pollution. There has been a lot of publicity over the purported relationship between autism and vaccinations, for example. As vaccination rates went up across the United States, so did autism. And if you splice the data in just the right way, it looks like some kids with autism have had more vaccinations. However, this correlation (which has led many to conclude that vaccination causes autism) has been widely dismissed by public health experts. The rise in autism rates is likely to do with increased awareness and diagnosis, or one of many other possible factors that have changed over the past 50 years.
Language further contorts the distinction, as some media outlets use words that imply causality without saying it. A recent example in Oklahoma occurred when its Governor, Mary Fallin, said there was a “direct correlation” between a recent increase in earthquakes and wastewater disposal wells. She would have liked to say that the wells caused the earthquakes, but the research only shows a correlation. Rather than misspeak, she embellished “correlation” with “direct” so that it sounds causal.
At times, a correlation does not have a clear explanation, and at other times we fill in the explanation. A recent news story reports that housing prices in D.C. correlate with reading proficiency. Many stories can be crafted to explain the phenomenon, but most people would be reluctant to conclude that a child’s reading proficiency could cause the price of their house to be higher or lower, or vice-versa. In contrast, a news story reporting that “30 years of research found a positive correlation between family involvement and a student’s academic success” in Florida feels like it has the weight of causality. The big difference between these two different correlations is our own belief in a likely mechanism for family to contribute to better grades.
In general, we should all be wary of our own bias: we like explanations. The media often concludes a causal relationship among correlated observances when causality was not even considered by the study itself. Without clear reasons to accept causality, we should only accept the existence of a correlation. Two events occurring in close proximity does not imply that one caused the other, even if it seems to makes perfect sense.
Rebecca Goldin is Professor of Mathematical Sciences at George Mason University and Director of STATS.org. She received her undergraduate degree from Harvard University and her Ph.D. from the Massachusetts Institute of Technology. She taught at the University of Maryland as a National Science Foundation postdoctoral fellow before joining George Mason in 2001. Her academic research is in symplectic geometry, group actions and related combinatorics. In 2007, she received the Ruth I. Michler Memorial Prize, presented by the Association for Women in Mathematics. Goldin is supported in part by NSF grant #1201458.