Genome wide association studies are unfit for causal inference

Genome wide association studies (GWAS) are observational studies of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. This type of study is very new and it shows how far computer science has come, enabling us to sequence the entire genome of hundreds of thousands of individuals if not over a million to be studied.

However, although these new studies are very interesting, one has to keep in mind that they are observational. In other words they are correlation studies, they enable us to find which genes variants correlate with having a certain phenotype. But as anyone who has taken an introductory class in statistics knows, correlation does not entail causality.

Despite all of this, to my surprise, Robert Plomin, an eminent behavioural geneticist, has made the claim that “Predictions from polygenic scores are an exception to the rule that correlations do not imply causation” in his book Blueprint. This is not true. What is probably happening is that Plomin is exaggerating his findings to acquire recognition. In this piece, I will show that GWAS are not causal, give examples of how they can be confounded and then proceed to provide some closing thoughts on this matter.

GWAS are confounded by population startification

Let’s imagine, for argument’s sake, that Swedish-Americans earn significantly more money than the average American. Let’s also assume that they do so for purely cultural reasons (protestant work ethic, avoidance of ostentatious spending like their Mediterranean counterparts, etc…). Since Swedish Americans represent a genetically distinct group, their distinctive gene variants will also tend to be associated with higher income, despite there not being any causal link. This would be an example of population stratification confounding GWAS.

Indeed, the definition of population stratification is the existence of a difference in allele frequencies between sub-populations in population as a result of non-random mating between individuals. I used the example of ethnicity above but it can also apply to situations of class endogamy or any genetically distinct group. The issue is that sub-populations that share genes will also tend to share a culture and an environment. This makes it hard to disentangle these factors when trying to assess the cause of an outcome of the group, be it social or health related.

Now that you have an idea of how GWAS can get confounded, let us look at a few concrete examples of GWAS that are very likely confounded.

Genes associated with ice cream flavor preference ?

The now famous DNA sequencing company 23 and me has conducted a GWAS study where they claim to have found genes associated with ice cream flavor preference. Although I don’t believe it’s impossible for genes to influence our sense of taste and our preferences, it seems impossible to me that it is our DNA that determines which artificial ice cream flavor we prefer. Throughout most of our evolutionary history none of these flavors were available, and certainly not artificial copies these flavors. What is probably happening here is that the study is picking up cultural groups that have a certain preference, or perhaps the study is simply not good and not reproducible.

A GWAS finds genes associated with walking pace

Another study found genes that explain roughly 9% of the variance in walking pace after controlling for body mass index. The individuals included range from 40 to 69 years old, one might think that this was the confounding variable but they claim they have also controlled for it along with other things. Nonetheless, even controlling for confounds in a regression (or in a GWAS) does not constitute a true causal method. When we lack an alternative and have a good idea of what are the possible confounds, such a method might be used to make a decision, albeit with a lower level of confidence than a RCT. But in this case, population stratification can happen in so many different ways that it is not warranted, in my opinion, to make causal claims from this data. It should also be noted that this study uses UK biobank data, which has been reported to have stratification problems.

Alleles correlated with which side of the face you use your phone on

To give a last reductio ad absurdum argument, an other GWAS found gene associations for using your cellphone on the left or right side of the face… As you have seen so far these studies can yield truly absurd results and should always be interpreted critically. Now, how can we prove that GWAS are actually confounded ?

Heritability reduced by in-family GWAS

A possible way to partly control for population stratification is by using in-family GWAS. Indeed, members of a family, although they can experience very different environments, will tend to share an ethnicity, a culture and a social class among other things. This paper shows that using such a methodology instead of classical GWAS studies decreases the heritability estimates significantly. This has been showed for height, IQ, educational attainment, smoking and more. What this study suggests is that most of the heritability estimates derived from previous GWAS not using the within sibling methodology are overestimated. One can conclude, that it is not only that GWAS can be confounded, it is that most are. Add to that the fact that in-family GWAS are not perfect and that even within a family environments can differ widely, so even the heritability estimates thus derived are probably too high.

Closing thoughts

GWAS are a brand new technology, which definitely has potential. If we can figure out which diseases do and do not have a genetic component and to what extent they do, it will enable medicine to start imagining new treatments accordingly. Nonetheless, one should keep in mind their stats 101 course and that correlation is not causation. New improved GWAS methods, such as the in-family method, will most likely keep on emerging, enabling us to control for more and more possible confounds. Be wary however, as even that does not constitute a robust causal method, but it will at least get us a bit closer to the answers we want. Perhaps one day we will develop true genetic causal methods, although at the moment I have no idea how this could be possible. Will future science prove me wrong ?

The reason I don’t believe we will ever develop artificial general intelligence

The reason I got interested in artificial intelligence is because the idea of artificial general intelligence, or AGI, amazed me.

It seems that the ambition of creating artificial life has inhabited people’s minds for millennia. In Greek mythology, Hephaestus forged and gave life to the bronze giant Talos, a form of artificial life.

Nonetheless, I currently believe that this hope will never be fulfilled. My argument underpinning this belief is short and based on a very useful heuristic, or rule of thumb, that I will lay out here. You can use this heuristic in many other situations, as I believe it is widely applicable and genuinely powerful.

Here is my argument :

The human brain is a system that is way too complex for us to understand and reproduce.

Now you might want to write an angry comment saying that this is no argument at all and that many things appeared too complex to us before we finally understood them, but read on for a little while.

To back up my claim, I will use as an example a paper written in 1978 by mathematician Prof. Sir Micheal Berry. I will admit that the paper is very technical and that I do not have the mathematical background to fully grasp it at the moment, I will thus rely on author Nassim Taleb’s account of this paper in his book The Black Swan. Imagine you are dealing with a billiard table with one ball on it :

If you know a set of basic parameters concerning the ball at rest, you can compute the resistance of the table (quite elementary), and you can gauge the strength of the impact, then it is rather easy to predict what would happen at the first hit. The second impact becomes more complicated, but possible; and more precision is called for. The problem is that to correctly compute the ninth impact, you need to take account the gravitational pull of someone standing next to the table (modestly, Berry’s computations use a weight of less than 150 pounds). And to compute the fifty-sixth impact, every single elementary particle in the universe needs to be present in your assumptions! An electron at the edge of the universe, separated from us by 10 billion light-years, must figure in the calculations, since it exerts a meaningful effect on the outcome.

The Black Swan, Nassim Taleb

As you can see, in this relatively simple system that seems to be the ball and the table, after a few tens of impacts, the trajectory of the ball becomes completely unpredictable. This is a property of complex dynamical systems, when experiencing very slight differences in initial conditions, the systems will first behave similarly, before completely diverging and taking radically different trajectories. A typical example of this is the double pendulum.

Now to come back to our initial subject, the brain, one could argue it is infinitely more complex system that a billiard table and a ball. If we cannot predict the behavior of a ball on a table after a certain number of impacts, could we really predict the response of a system composed of 100 billion neurons that interact together trough 100 trillion synapses to, let’s say, a pain stimulus ? Infinitely small variations in the stimuli could lead to widely different responses.

Indeed, the brain, like most sufficiently complex systems, displays characteristics of opacity (the brain is like a black box, we don’t know what’s going on inside), emergence (the system cannot be understood by looking at its parts) and non-linearity (changes in conditions can lead to a disproportionate responses) which make it extremely difficult to study.

Although the rise of Big Data and the explosion of computing power has allowed to build increasingly massive brain simulations, these remain biologically inaccurate. Furthermore, even if we were able to mimic the “hardware” perfectly, it would remain nearly impossible to produce the appropriate “software” to reproduce general intelligence.

Although I am genuinely interested in science, and I believe we still have many amazing discoveries to make, I do believe that some problems are just too complex to be solved. Implanting general intelligence in machines is one of them. Although engineers are sometimes incredibly resourceful, they simply cannot compete against billions of years of natural selection.

Economics talk with Kilian Tep : Rent-seeking, Keynes, Hayek and markets

Kilian Tep is a friend of mine who is a data scientist on top of having studied economics. We share several interests and decided to have a conversation about them. We discuss how Keynesian policies often backfire and create inequality, compare markets to centrally planned economies while delving into how all of these elements relate to corruption and rent seeking. You can listen to the podcast below. Enjoy.

A short introduction to the replication crisis and fraud in academia

This last year and a half a phenomenon in academia has caught my attention. A big chunk of the scientific papers published in reputable journals don’t replicate. In this article we will try to explain the reasons behind this crisis, its implications and what we might do about it.

What is replication ?

You could make a solid case for the view that the main goal of science is to find the laws of the world. Indeed, the scientific enterprise has, since its inception, expanded our understanding of our universe. A reason for that is the scientist’s ability to discover patterns or constants in our world. Metals expand when they are heated. That holds true whether you live in ancient Mesopotamia or modern Australia, and it will most likely be true in the future. This property is unaffected by either time or space, which, one could argue, is the definition of a law.

Even in the legal sense, laws should to be applied equally on the jurisdiction for which they are designed and they should be stable trough time. That property demarcates the Rule of Law from arbitrary trials. This gives the citizen a sense of legal security and predictability of the judge, if I do X then law Y will apply.

With regards to scientific laws, they have roughly the same purpose. They should be true universally and make the world more predictable and understandable to us. Since I know that metals expand when heated, I know that if I were to heat a piece of steel in two weeks it will expand. Since I know how the metal will behave in the future, I can use that knowledge to solve problems I might have.

“What does any of that have to to with the problems in academia ?” You might ask. Well, replicating a study means conducting it again, by using the same methods and gathering new data the same way the former study did, or even by re-analyzing the same data a second time.

If our research methods are valid and enable us to find properties of the world that are perennially true, then if I conduct the same study twice, I should get the same outcome twice.

Unfortunately, for a significant portion of the studies in scientific journals, even the most prestigious ones, results do not replicate. And that phenomenon affects almost all of the disciplines, with some being hit harder than others. That includes medicine, psychology, economics, sociology, criminology, neuroscience, artificial intelligence and many more.

Why are scientific journals full of false findings ?

First of all, I have to say that it’s normal for some studies to yield false findings, that is just part of science. Look at it this way, of all the possible hypotheses you can make about the world, only a tiny fraction of them will be true. There are way more molecules that don’t cure headaches than molecules that do. Imagine if you had to test them all to find a cure of headaches.

Let’s say you were to test 100 000 compounds among which only one could cure a headache, and your testing methodology returned a false positive only 1% of the time. Even with that relatively low false positive rate, after having tested every single compound, you should have about 1000 findings that say that their compound works even though it doesn’t.

Of course, when researching a certain subject, you don’t test every single hypothesis, you try to make theory-backed guesses as to what can work and then test these hypotheses that you find plausible (this is called inference to the best explanation, or abduction). Nonetheless, the asymmetry between true and false propositions still holds.

However it seems unlikely that this asymmetry is the sole reason for the epidemic of false results in scientific journals. Let’s take a look at some numbers.

FieldEstimated replicability rate
Economics3060 %
Machine Learning (Recommender systems)~ 40%
Psychology~ 40%
Medecine> 50%
Marketing*~ 40%
Political science*~ 50 %
Sociology*~ 40%
Oncology~ 11 %
Physics*~ 70%
Chemistry*~ 60%
Biology*~ 60%
Estimated replicability rate by discipline

Note that in rows marked with an asterix, the replicability rate has been estimated trough surveys of researchers, not actual replication attempts, whereas for the other fields the studies were actually conducted once again.

As you can see, it is not at all uncommon to find fields with a replicability rate of 50% or below. The problem is severe and it seems like it is worse in over-hyped disciplines such as machine learning or oncology.

Indeed, these findings are the result of perverse incentives created by the science publication system. In order to get their grants renewed, scientists have to publish papers in scientific papers, preferably prestigious ones, otherwise their careers might come to an end. This is called the publish or perish effect.

Moreover, scientific journals tend to only publish novel and positive findings. The reason being that, they are private companies than operate for profit. Five companies publish half of academic research, they lobby universities get them to publish exclusively in their journals and then make other scientists pay to access research. This is a perfect example of rent-seeking, capturing public goods for private gain. Notably, Elsevier is the most profitable company in the world with a 40% profit margin.

These two factors combined create the aforementioned incentives, which drive researchers to produce novel, positive findings at all costs, even if it entails partaking in questionable research practices or downright falsifying results.

Questionable research practices are widespread in academia. It is very hard to gauge the extent to which they are, since, almost by definition, the individuals who engage in them try to conceal them.

Nonetheless, we do have some numbers. In a survey of biomedical post-doc students, 27% of them said they were willing to select or omit data to improve their results in order to secure funding. Note that, as far as I know, that survey was not even anonymous ! What is more, an anonymous survey of psychology researchers found that the majority of researchers have engaged in questionable research practices.

We can add to this body of evidence this testimony by a young social psychology researcher who was outright fired from her degree for refusing to engage in p-hacking. She also reported that her fellow researchers would engage in p-hacking to further their left-wing political agenda.

Yet another testimony by an economics researcher makes several concerning accusations. She reports that senior economists silence opinions that diverge from theirs, take credit for work that is not theirs, discriminate against some minorities and more.

Richard Thaler, an eminent researcher know for his contributions to behavioral economics and ex-president of the American Economics association, reportedly tried to discredit valuable research because it contradicted his views. Among this research is a paper reporting that only 33% of economics research can be replicated without contacting the original authors, which I used in my table above to estimate the replicability rate in economics.

Machine learning, over-hyped as it is, suffers from an obsession on devising new state of the art algorithms that achieve high scores on benchmark datasets. This obsession is fueled by scientific journals, who reward solely these types of studies, while refusing to publish research aiming to solve practical, real world problems albeit while using simpler algorithms. There also seems to be a bias towards “mathiness”, with reviewers reportedly asking authors to add mathematical formulas to make their papers seem more “sciency” and marketable.

To top it off, we might observe that there is no correlation between a paper replicating and its number of citations. This could denote several issues that plague academia. It has been observed that researchers will sometimes refuse to cite colleagues with whom they compete for a grant and that they will form citation alliances sometimes referred to as citation rings.

What can we do about it ?

There are several initiatives that could be implemented in order to mitigate this situation.

First of all, science should be freely accessible, since it is funded with our tax money. This reform is necessary, but could be very difficult to implement due to scientific journals’ important lobbying power.

There are dozens open access journals, and many scientists choose to only publish in those. Nonetheless, early career scientists have a strong incentive to publish in renowned journals in order to advance their careers and possibly get tenure. In many universities, tenure is conditional on publishing in these outlets.

Secondly, more replication studies should be undertaken. Online repositories for replication studies are beginning to emerge in order to host this type of studies, which doesn’t get much love from the oligarchs of scientific publishing.

Other than that, scientist should submit their data and code along with their papers, not only to detect fraud but open data and code make replication a lot easier. As some say, “In God we trust, all others bring data”.

Finally, I am personally of the opinion that we should completely ditch peer review as there is scant evidence that it can even beat random screening. It is likely that in the future, statistical models will be devised to rate the quality of a paper and its probability of being replicated. In order to extract the necessary features for such a model, one could turn to natural language processing models.

These latter models are also being used in another way, Brian Uzzi, professor at Northwestern university, trained a NLP model to detect elements of language that indicate fraud or low confidence in the findings, rather than trying to use the measurements and metrics of the study.

Closing thoughts

Hopefully this piece will have fulfilled its purpose by giving an thorough yet brief introduction to some of the major problems facing academia currently. It is regrettable that such a noble pursuit has become so corrupt, discouraging many youths to pursue a career in academia, myself included.

Despite it all, I am still optimistic, insofar as academia does not have a monopoly on science, far from it. Private companies and institutes have been responsible for many scientific breakthroughs in the past two centuries. The most recent notable example would be Google’s quantum supremacy. The private sector is particularly proficient in the advancement of applied, practical science, in other words : technology development.

I encourage everyone interested in science but critical of academia to not get disheartened with science as a whole. If you consider yourself a humanist, perhaps solving people’s everyday problems trough knowledge is more important than theoretical progress. After all, don’t we pursue science to better our lot ?