But most of us don’t know what it is or how it works.
The Bayes approach, famously used to crack the Nazi Enigma code during World War Two, is now used every day to manage uncertainty across science, technology, medicine and much more.
Our world view and resultant actions are often driven by the simple theorem, devised in secret more than 150 years ago by a quiet English mathematician and theologian, Thomas Bayes, and only published after his death.
Thomas Bayes’ insight was remarkably simple. The probability of a hypothesis being true depends on two criteria (1) how sensible it is, based on current knowledge (the “prior”), and (2) how well it fits new evidence. Yet, for 100 years after his death, scientists typically evaluated their hypotheses against only the new evidence. This is the traditional hypothesis-testing (or frequentist) approach that most of us are taught in science class.
The difference between the Bayesian and frequentist approaches is starkest when an implausible explanation perfectly fits a piece of new evidence. Let me concoct the hypothesis: “The Moon is made of cheese.” I look skywards and collect relevant new evidence, noting that the Moon is cheesy yellow in colour. In a traditional hypothesis-testing framework, I would conclude that the new evidence is consistent with my radical hypothesis, thus increasing my confidence in it (purple panel below).
But using Bayes’ Theorem, I’d be more circumspect. While my hypothesis fits the new evidence, the idea was ludicrous to begin with, violating everything we know about cosmology and mineralogy. Thus, the overall probability of the Moon being cheese – which is a product of both terms – remains exceedingly low (orange panel below).
Bayesian inference considers how well the hypothesis fits existing knowledge, and how well it fits new evidence. For simplicity, the Normalising Constant has been omitted from the bottom formula.
Admittedly, this is an extreme caricature. No respectable scientist would ever bother testing such a dumb hypothesis. But scientists globally are always evaluating a huge number of hypotheses, and some of these are going to be rather far-fetched. For example, a 2010 study initially suggested that people with moderate political views have eyes that can literally see more shades of grey. This was later dismissed after further testing, conducted because the researchers recognised it was implausible to begin with. But it’s almost certain that other similar studies have been accepted uncritically.
A feature of Bayesian inference is that prior belief is most important when data are weak. And the stronger our prior belief, the more data we need to be swayed. We use this principle intuitively.
For example, if you are playing darts in a pub and a nearby stranger claims to be a professional darts player, you might initially assume the person is joking. You know almost nothing about the person, but the chances of meeting a real professional darts player are small: DartPlayers Australia says there are only about 15 in Australia.
If the stranger throws a dart and hits the bullseye, it still mightn’t sway you. It could just be a lucky shot. But if that person hits the bullseye ten times in a row, you would tend to accept their claim of being a professional. Your strong prior belief becomes overridden as evidence accumulates. Bayes’ Theorem at work again.
We use prior knowledge from our experiences and memories, and new evidence from our senses, to assign probabilities to everyday things and manage our lives.
Consider something as simple as answering your work mobile phone, which you usually keep on your office desk when at work, or on the charger when at home. You are at home gardening and hear it ringing briefly inside the house. Your new data is consistent with it being anywhere indoors, yet you go straight to the charger.
You have combined your prior knowledge of the phone (usually either on the office desk, or on the charger at home) with the new evidence (somewhere in the house) to pinpoint its location. If the phone is not at the charger, then you use your prior knowledge of where you have sometimes previously left the phone to narrow down your search. You ignore most places in the house (the fridge, the sock drawer) as highly unlikely a priori, and hone in on what you consider the most likely places until you eventually find the phone. You are using Bayes’ Theorem in your everyday life.
Bayesian reasoning now underpins vast areas of human enquiry, from cancer screening to global warming, genetics, monetary policy and artificial intelligence. For instance, risk assessment and insurance are two areas where Bayesian reasoning is fundamental. Every time a cyclone or flood hits a region, insurance premiums skyrocket. Why?
Risk can be tremendously complex to quantify and current conditions might provide scant information about likely future disasters. Insurers therefore estimate risk based on both current conditions and what’s happened before. Every time a natural disaster strikes, they update their prior information on that region into something less favourable. They foresee a greater probability of future claims, and so raise premiums.
Bayesian inference similarly plays an important role in medical diagnosis. A symptom (the new evidence) can be a consequence of various possible diseases (the hypotheses). But different diseases have different prior probabilities for different people. A major problem with online medical tools such as webMD is that prior probabilities are usually not properly taken into account. “Dr Google” usually knows very little about your personal history. Even if relatively mild symptoms are entered, a huge range of possible ailments can be thrown up, ranging from the common cold to Ebola.
Your regular GP, who knows your medical records and recent history, will be able to rule out many of the more outlandish diseases a priori, and provide you with a narrower and more sensible diagnosis. Bayes’ Theorem once again.
Bayesian approaches allow us to extract precise information from vague data, to find narrow solutions from a huge universe of possibilities. They were central to how British mathematician Alan Turing cracked the German Engima code. It has been argued that this hastened the allied victory in World War II by at least two years and thus saved millions of lives.
To decipher a set of encrypted German messages, searching the near-infinite number of potential translations was impossible, especially as the code changed daily via different rotor settings on the tortuously complex Enigma encryption machine. Turing’s crucial Bayesian insight was that certain messages were much more likely than other messages. These likely solutions, or “cribs” as his team called them, were based on previous decrypted messages, as well as logical expectations. For example, messages from U-boats were likely to contain phrases related to weather or allied shipping.
The strong prior information provided by these cribs greatly narrowed the number of possible translations that needed to be evaluated, allowing Turing’s codebreaking ‘bombe’ machine to decipher the Enigma code rapidly enough to outpace the daily changes.
Of course, problems can arise in Bayesian inference when priors are incorrectly applied. If our prior beliefs are badly skewed, we can fail to be convinced even by substantial contradictory evidence.
In law courts, for instance, this can lead to serious miscarriages of justice (see the prosecutor’s fallacy). In a famous example from the UK, Sally Clark was wrongly convicted in 1999 of murdering her two children. Prosecutors had argued that the probability of two babies dying of natural causes (the prior probability that she is innocent of both charges) was so low – one in 73 million – that she must have murdered them.
But they failed to take into account that the probability of a mother killing both of her children (the prior probability that she is guilty of both charges) was also incredibly low. So the relative prior probabilities that she was totally innocent or a double murderer were more similar than initially argued.
Clark was later cleared on appeal with the appeal court judges criticising the use of the statistic in the original trial. This highlights how poor understanding of Bayes’ Theorem can have far-reaching consequences.
Why are we so interested in Bayesian methodology? In our own field of study, evolutionary biology, as in much of science, Bayesian methods are becoming increasingly central.
From predicting the effects of climate change to understanding the spread of infectious diseases, biologists are typically searching for a few plausible solutions from a vast array of possibilities. In our research, which mainly involves reconstructing the history and evolution of life, these approaches can help us find the single correct evolutionary tree from literally billions of possible branching patterns. In work – as in everyday life – Bayesian methods can help us to find small needles in huge haystacks.
Mike Lee is an evolutionary biologist at Flinders University and the South Australian Museum.
Benedict King is a Ph.D student at Flinders University.
This article was originally published in The Conversation and has been adapted for InDaily by the authors.
Local News Matters
Media diversity is under threat in Australia – nowhere more so than in South Australia. The state needs more than one voice to guide it forward and you can help with a donation of any size to InDaily. Your contribution goes directly to helping our journalists uncover the facts. Please click below to help InDaily continue to uncover the facts.