It All Adds Up: The Certainty of Chance
Cast your mind back to before this column existed, when I was just a penniless mathematician penning my first GeekPlanet article. I suggested then that those believing mathematics a language were in error. It is, I insisted, a form of literature, a method for harnessing the language of arithmetic, allowing us to deliver concepts and ideas beyond what we see immediately upon the page.
This is not a truth without its difficulties, however. The most troublesome is this: every damn fool is convinced that they can write.
There is no bright line dividing simple stories from true literature, but the latter has at least conventions. They can be broken, sure, indeed some of the greatest works of literature do just that, but you need to know those rules before you can safely break them. The writing should matter no less than the plot, and that plot should be recognisably unique, not merely the latest iteration of boy meets girl/detective meets case/space prince meets intergalactic dark lord. Granted, you can stumble into the realms of literature entirely by accident, but people shouldn’t fool themselves that their latest chapter of “Fatal Passion” or “Harry Potter and the Caves of Androzani” is going to cut the mustard.
So it is with statistics. Ask a layman for the airspeed velocity of an unladen swallow, and they will shrug their shoulders (once they’re done quoting Monty Python). They’re not zoologists. Enquire how best one might whip up a batch of sulphuric acid, and they will confess ignorance. They’re not chemists. Press them for a figure on the speed required to escape the Earth’s pull and they give you happily blank looks. They’re not physicists (even though in truth that particular slice of rocket science isn’t exactly rocket science).
Ask them how likely Wayne Rooney is to score three times in his next five attempts, though, and suddenly everyone‘s a goddamn expert.
Obviously, I don’t care how inaccurate people’s football predictions are. Nor am I claiming some kind of secret insight into Rooney that only statistics can provide (though I could give you a figure that would be, no pun intended, in the ball-park). However, the principle extends to far more serious issues, and more insidious phrasing. Instead of goal tallies, it’s Muslim participation in and relations with society in general. Instead of “How plausible is it that this will happen”, it’s “How plausible is it that this did happen without it suggesting Muslims are benefiting from positive discrimination/actually have it easy/all part of an international conspiracy to destroy the West headed by Osama Bin Laden, Imran Khan, and Art Malik?”
I’ve highlighted two examples of this before. In both cases, what made their clumsy number-crunching so disgraceful is that methods already exist to consider such things dispassionately and precisely. Indeed, such articles tend to be laden with such weasel words as “It would at least appear” and “Might it not be the case that” precisely because they are afraid to submit their ideas to proper analysis, at which time they’re liable to fall apart. Rather than strengthen their argument (which, of course, they can’t) they weaken their conclusions, hoping that the message will still come across. That the damage will still be done.
By way of an example, let’s return to Paul, the allegedly psychic German octopus. Let’s ask a question: “Could he have gotten so many correct answers if he was just picking teams without preference?”, and see what the data reveals.
First, let’s look at his record. In 14 games, Paul picked the winning side 12 times. That certainly seems like a lot *. Is it, though? We turn to the endlessly useful binomial distribution. For those who don’t remember, the binomial distribution allows us to calculate how likely a specific number of trials are going to succeed, assuming we know how many trials there are, that each one can either succeed (with the same chance) or fail (with the same chance), and that no trial has an effect on any other.
This seems to fit Paul’s situation pretty well. The trials are the process of picking, with success when the winning team is chosen. We can assume his chances of guessing right do not change; either because he really is just a humble sea creature picking at random, or because the ancestral ghosts of the octopus civilisation are equally likely to whisper the answer to him each time. It also seems likely that the result of one trial will not affect any other; octopuses are evil and mendacious creatures, and unlikely to suffer a crisis of confidence.
Unfortunately, we don’t actually know what the chance of a correct guess actually is. We do know that if Paul isn’t psychic, and his handlers aren’t pulling a fast one, then he should have had a 50% chance of succeeding with each guess.
This is a null hypothesis, something we assume is true until overwhelming evidence changes our mind. So how plausible is it that Paul received no help either from overly enthusiastic German patriots or cephalopodic forces beyond the ken of mortal man? Put another way, assuming we are right, how likely is it that Paul would do so well entirely by chance?
We need to calculate the probability that 14 trials with a 50% chance of success will succeed at least 12 times. Why at least 12? For the same reason if you toss a coin 20 times to check whether it was fair you wouldn’t be expecting exactly 10 heads. Indeed, even for a fair coin, the chance of exactly 10 heads in 20 is only 17.6%, because although it’s the most likely of the 21 possible results (from no heads through to 20), you’re still much more likely to not get exactly 10 heads. On the other hand, the probability of getting at least ten heads is 58.8%.
If our null hypothesis is true, there is no ultimate difference between letting Paul pick a team and simply deciding by a coin toss. As it turns out, the chance of 14 tosses resulting in at least 12 heads is 0.65%. Clearly, that’s pretty damn low. In fact, it’s so low that our assumption that Paul is just as likely to pick the loser as the winner now seems distinctly implausible. The vast majority of people who use statistics would suggest any value below 5% (and certainly 1%) is sufficiently unlikely as to make us reject our null hypothesis. On the other hand, if we’d decided for some reason that Paul’s chance of success was 70% (he chooses the winner seven time out of ten), he would have a 16% chance of providing at least 12 correct guesses. That’s greater than 5% or even 10%, and so in that event we’d conclude that there is insufficient reason to change our minds about Paul‘s chances of guessing correctly.
Naturally, this isn’t the end of the story. We’re satisfied that Paul’s chance of success can’t be 50%, but we still don’t know what it actually is (there is a wide range of possible answers, none of which the above method would have rejected; part of the appeal of hypothesis testing is that it requires overwhelming evidence before you can claim something is true). Nor do we know why the chance isn’t 50%. Most likely, his handlers are stacking the metaphorical deck, deciding which team they find most likely to win offering that box first. Or perhaps they always offer Germany toward him, making him much more likely (though not certain) to choose it. That shifts the question from “Why is Paul so uncannily accurate?” to “Why can’t our national team be as good as Germany’s?”
Those are questions for another time, though. The critical point here is that hypothesis testing isn’t difficult, especially with websites like this one which reduce the process to merely plugging in numbers. In other words, even if you never learn to apply it yourself, you now know to always be suspicious of anyone like Pipes and Goldberg who pose questions for which hypothesis testing is ideally suited, but don’t bother applying it. Either they don’t know what they’re doing, or they very much do know, and are hoping you don’t notice.
Hopefully I’ve now made that a little bit harder for them.
* For the purposes of the example we’ll ignore my previous points about the fact that we’d never have heard of him had he not already racked up a fairly impressive tally.