Genome wide association studies are unfit for causal inference

Genome wide association studies (GWAS) are observational studies of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. This type of study is very new and it shows how far computer science has come, enabling us to sequence the entire genome of hundreds of thousands of individuals if not over a million to be studied.

However, although these new studies are very interesting, one has to keep in mind that they are observational. In other words they are correlation studies, they enable us to find which genes variants correlate with having a certain phenotype. But as anyone who has taken an introductory class in statistics knows, correlation does not entail causality.

Despite all of this, to my surprise, Robert Plomin, an eminent behavioural geneticist, has made the claim that “Predictions from polygenic scores are an exception to the rule that correlations do not imply causation” in his book Blueprint. This is not true. What is probably happening is that Plomin is exaggerating his findings to acquire recognition. In this piece, I will show that GWAS are not causal, give examples of how they can be confounded and then proceed to provide some closing thoughts on this matter.

GWAS are confounded by population startification

Let’s imagine, for argument’s sake, that Swedish-Americans earn significantly more money than the average American. Let’s also assume that they do so for purely cultural reasons (protestant work ethic, avoidance of ostentatious spending like their Mediterranean counterparts, etc…). Since Swedish Americans represent a genetically distinct group, their distinctive gene variants will also tend to be associated with higher income, despite there not being any causal link. This would be an example of population stratification confounding GWAS.

Indeed, the definition of population stratification is the existence of a difference in allele frequencies between sub-populations in population as a result of non-random mating between individuals. I used the example of ethnicity above but it can also apply to situations of class endogamy or any genetically distinct group. The issue is that sub-populations that share genes will also tend to share a culture and an environment. This makes it hard to disentangle these factors when trying to assess the cause of an outcome of the group, be it social or health related.

Now that you have an idea of how GWAS can get confounded, let us look at a few concrete examples of GWAS that are very likely confounded.

Genes associated with ice cream flavor preference ?

The now famous DNA sequencing company 23 and me has conducted a GWAS study where they claim to have found genes associated with ice cream flavor preference. Although I don’t believe it’s impossible for genes to influence our sense of taste and our preferences, it seems impossible to me that it is our DNA that determines which artificial ice cream flavor we prefer. Throughout most of our evolutionary history none of these flavors were available, and certainly not artificial copies these flavors. What is probably happening here is that the study is picking up cultural groups that have a certain preference, or perhaps the study is simply not good and not reproducible.

A GWAS finds genes associated with walking pace

Another study found genes that explain roughly 9% of the variance in walking pace after controlling for body mass index. The individuals included range from 40 to 69 years old, one might think that this was the confounding variable but they claim they have also controlled for it along with other things. Nonetheless, even controlling for confounds in a regression (or in a GWAS) does not constitute a true causal method. When we lack an alternative and have a good idea of what are the possible confounds, such a method might be used to make a decision, albeit with a lower level of confidence than a RCT. But in this case, population stratification can happen in so many different ways that it is not warranted, in my opinion, to make causal claims from this data. It should also be noted that this study uses UK biobank data, which has been reported to have stratification problems.

Alleles correlated with which side of the face you use your phone on

To give a last reductio ad absurdum argument, an other GWAS found gene associations for using your cellphone on the left or right side of the face… As you have seen so far these studies can yield truly absurd results and should always be interpreted critically. Now, how can we prove that GWAS are actually confounded ?

Heritability reduced by in-family GWAS

A possible way to partly control for population stratification is by using in-family GWAS. Indeed, members of a family, although they can experience very different environments, will tend to share an ethnicity, a culture and a social class among other things. This paper shows that using such a methodology instead of classical GWAS studies decreases the heritability estimates significantly. This has been showed for height, IQ, educational attainment, smoking and more. What this study suggests is that most of the heritability estimates derived from previous GWAS not using the within sibling methodology are overestimated. One can conclude, that it is not only that GWAS can be confounded, it is that most are. Add to that the fact that in-family GWAS are not perfect and that even within a family environments can differ widely, so even the heritability estimates thus derived are probably too high.

Closing thoughts

GWAS are a brand new technology, which definitely has potential. If we can figure out which diseases do and do not have a genetic component and to what extent they do, it will enable medicine to start imagining new treatments accordingly. Nonetheless, one should keep in mind their stats 101 course and that correlation is not causation. New improved GWAS methods, such as the in-family method, will most likely keep on emerging, enabling us to control for more and more possible confounds. Be wary however, as even that does not constitute a robust causal method, but it will at least get us a bit closer to the answers we want. Perhaps one day we will develop true genetic causal methods, although at the moment I have no idea how this could be possible. Will future science prove me wrong ?