read and noted: istatistik

Showing posts with label istatistik. Show all posts

Tuesday, February 22, 2022

Ten Global Trends

You can't fix what is wrong in the world if you don't know what's actually happening. In this book, straightforward charts and graphs, combined with succinct explanations, will provide you with easily understandable access to the facts that busy people need to know about how the world is really faring.

Polls show that most smart people tend to believe that the state of the world is getting worse rather than better. Consider a 2016 survey by the global public opinion company YouGov that asked folks in 17 countries, "All things considered, do you think the world is getting better or worse, or neither getting better nor worse?” Fifty-eight percent of respondents thought that the world is getting worse, and 30 percent said that it is doing neither. Only 11 percent thought that things are getting better. In the United States, 65 percent of Americans thought that the world is getting worse, and 23 percent said neither. Only 6 percent of Americans responded that the world is getting better.

This dark view of the prospects for humanity and the natural world is, in large part, badly mistaken. We demonstrate it in these pages using uncontroversial data taken from official and scientific sources.

Of course, some global trends are negative. As Harvard University psychologist Steven Pinker says: "It's essential to realize that progress does not mean that everything gets better for everyone, everywhere, all the time. That would be a miracle, that wouldn't progress." For example, man made climate change arising largely from increasing atmospheric concentrations of carbon dioxide released from burning fossil fuels could become a significant problem for humanity during this century. The spread of plastic marine debris is a big and growing concern. Many wildlife populations are declining, and tropical forest area continues shrinking. In addition, far too many people are still malnourished and dying in civil and sectarian conflicts around the globe. And, of course, the world is afflicted by the current coronavirus pandemic.

However, many of the global trends we describe are already helping redress such problems. For example, the falling price of renewable energy sources incentivize the switch away from fossil fuels. Moreover, increasingly abundant agriculture is globally reducing the percentage of people who are hungry while simultaneously freeing up land so that forests are now expanding in much of the world. And unprecedentedly rapid research has significantly advanced testing, tracking, and treatment technologies to ameliorate the coronavirus contagion.

PSYCHOLOGICAL GLITCHES MISLEAD YOU

So why do so many smart people wrongly believe that all things considered, the world is getting worse?

Way back in 1965, Johan Galtung and Mari Holmboe Ruge, from the Peace Research Institute Oslo, observed, "There is a basic asymmetry in life between the positive, which is difficult and takes time, and the negative, which is much easier and takes less time-compare the amount of time needed to bring up and socialize an adult person and the amount of time need ed to kill him in an accident, the amount of time needed to build a house and to destroy it in a fire, to make an airplane and to crash it, and so on." News is bad news; steady progress is not news.

Smart people especially seek to be well informed and so tend to be voracious consumers of news. Since journalism focuses on dramatic things and events that go wrong, the nature of news thus tends to mislead readers and viewers into thinking that the world is in worse shape than it really is. This mental shortcut causes many of us to confuse what comes easily to mind with what is true; it was first identified in 1973 by behavioral scientists Amos Tversky and Daniel Kahneman as the "availability bias." Another reason for the ubiquity of mistaken gloom derives from a quirk of our evolutionary psychology. A Stone Age man hears a rustle in the grass. Is it the wind or a lion? If he assumes it's the wind and the rustling turns out to be a lion, then he's not an ancestor. We are the descendants of the worried folks who tended to assume that all rustles in the grass were dangerous predators and not the wind. Because of this instinctive negativity bias, most of us attend far more to bad rather than to good news. The upshot is that we are again often misled into thinking that the world is worse than it is.

"Judgment creep" is yet another explanation for the prevalence of wrong-headed pessimism. We are misled about the state of the world because we have a tendency to continually raise our threshold for success as we make progress, argue Harvard University psychologist Daniel Gilbert and his colleagues. "When problems become rare, we count more things as problems. Our studies suggest that when the world gets better, we become harsher critics of it, and this can cause us to mistakenly conclude that it hasn't actually gotten better at all," explains Gilbert. "Progress, it seems, tends to mask itself." Social, economic, and environmental problems are being judged intractable because reductions in their prevalence lead people to see more of them. More than 150 years ago, political scientist Alexis de Tocqueville noted a similar phenomenon as societies progress, one that has since been called the Tocqueville effect.

What, though, accounts for progress?

Some smart folk who acknowledge that considerable social, economic, and environmental progress has been made still worry that progress will not necessarily continue.

"Human beings still have the capacity to mess it all up. And it may be that our capacity to mess it up is growing," asserted Cambridge University political scientist David Runciman in a July 2017 Guardian article. He added: "For people to feel deeply uneasy about the world we inhabit now, despite all these indicators pointing up, seems to me reasonable, given the relative instability of the evidence of this progress, and the [unpredictability] that overhangs it. Everything really is pretty fragile."

Runciman is not alone. The worry that civilization is just about to go over the edge of a precipice has a long history. After all, many earlier civilizations and regimes have collapsed, including the Babylonian, Roman, Tang, and Mayan Empires, and more recently the Ottoman and Soviet Empires.

In their 2012 book, Why Nations Fail: The Origins of Power, Prosperity, and Poverty, economists Daron Acemoglu and James Robinson persuasively outline reasons for the exponential improvement in human well-being that started about two centuries ago.

They begin by arguing that since the Neolithic agricultural revolution, most societies have been organized around "extractive" institutions-political and economic systems that funnel resources from the masses to the elites.

In the 18th century, some countries including Britain and many of its colonies-shifted from extractive to inclusive institutions. "Inclusive economic institutions that enforce property rights, create a level playing field, and encourage investments in new technologies and skills are more conducive to economic growth than extractive economic institutions that are structured to extract resources from the many by the few," they write. "Inclusive economic institutions are in turn supported by, and support, inclusive political institutions," which "distribute political power widely in a pluralistic manner and are able to achieve some amount of political centralization so as to establish law and order, the foundations of secure property rights, and an inclusive market economy." Inclusive institutions are similar to one another in their respect for individual liberty. They include democratic politics, strong private property rights, the rule of law, enforcement of contracts, freedom of movement, and a free press. Inclusive institutions are the basis of the technological and entrepreneurial innovations that produced a historically unprecedented rise in living standards in those countries that embraced them, including the United States, Japan, and Australia as well as the countries in Western Europe. They are qualitatively different from the extractive institutions that preceded them.

The spread of inclusive institutions to more and more countries was uneven and occasionally reversed. Those advances and in the University of Illinois at Chicago economist Deirdre Mc Closkey's view, the key role played by major ideological shifts resulted in what McCloskey calls the "great enrichment," which boosted average incomes thirtyfold to a hundredfold in those countries where they have taken hold.

The examples of societal disintegration cited earlier, whether Roman, Tang, or Soviet, occurred in extractive regimes. Despite crises such as the Great Depression, there are no examples so far of countries with long-established inclusive political and economic institutions suffering similar collapses.

In addition, confrontations between extractive and inclusive regimes, such as World War II and the Cold War, have generally been won by the latter. That suggests that liberal free-market democracies are resilient in ways that enable them to forestall or rise above the kinds of shocks that destroy brittle extractive regimes.

If inclusive liberal institutions can continue to be strengthened and further spread across the globe, the auspicious trends documented in this book will extend their advance, and those that are currently negative will turn positive. By acting through inclusive institutions to increase knowledge and pursue technological progress, past generations met their needs and hugely increased the ability of our generation to meet our needs. We should do no less for our own future generations. That is what sustainable development looks like.

Thursday, December 24, 2020

Bilimsel Numeroloji

Bu yazı Salih Durhan'ın blogundan alınmıştır. (Akademik Matematik Blogu)

Diyelim “elma kansere iyi geliyor” diye bir hipoteziniz var, bunu nasıl kanıtlarsınız? Uygulama detaylarını bir kenara bırakacak olursak, temel yöntem şu: Bir grup kanser hastasını alıp ikiye ayırıyorsunuz, birinci gruba elma yedirmiyorsunuz (kontrol grubu), ikinci gruba her gün bir elma yediriyorsunuz (deney grubu). Sonra bakıyorsunuz, hangi grup daha uzun yaşadı. Ortalama yaşam süresinde azıcık fark varsa, mesela bir kaç ay, bu kadarı elmayı kanser ilacı yapmaya yeter mi? Tam ne kadar fark olursa, “evet elma kansere iyi geliyor” diyeceğiz? O yüzden ver elini istatistik, ver elini p değeri.

Kanser çeşidi, hastaların yaşı, cinsiyeti, tıbbi geçmişleri soruyu daha da zorlaştırıyor, ama biz bunlara takılmayalım. Diyelim ki kanser üzerinde etkili olabilecek bütün değişkenleri aynı 100’er kişilik iki grup insan var, kontrol grubu grup elma yemiyor, deney grubu her gün birer elma yiyor. Yıllarca izledik her iki grubu da baktık ki, deney grubunun ortalama yaşam süresi 1 yıl daha uzun. Bilimsel olarak bu 1 yıl fark elmadandır demek için kıvranıyoruz, Allah göstermesin bu sonuç tamamen tesadüfen de olabilirdi. Genel kabul gören bilimsel yaklaşıma göre, önce “boş hipotezi” (null hypothesis) ortaya koyuyoruz:

$H_{0} : Elmanın kanser hastalarının yaşam süresi üzerinde bir etkisi yoktur.$ Sonra kendimize şunu soruyoruz: Eğer boş hipotez doğru olsaydı, iki grup arasındaki 1 yıllık yaşam süresi farkını ne kadar ihtimalle gözlemleyebilirdik?

Detaylar teknik bir yazının konusu, ama bu soruya yanıt vermek gerçekten bilimsel bir yanıt vermek mümkün, yanıtın literatürdeki adı p değeri. Bu p değeri çok küçükse, mesela 0.001 $0.01$ ’den küçükse, o zaman şu sonuca varabiliriz:

Eğer elma kansere iyi gelmeseydi, 100’er kişilik kontrol ve deney gruplarının yaşam ortalaması farkının 1 yıl olması ihtimali %1’den küçük olurdu. Demek ki elma kansere iyi geliyormuş.

Kullanılan modeller, test edilen hipotezler bizim uyduruk senaryomuzdan çok daha karmaşık olabilir ama bilimin önemli bir kısmı p değeriyle yapılıyor. Bu, sorunları saymakla bitmeyecek kadar yanlış bir yaklaşım. Yıllardır bırakalım bu p fetişini diye bir sürü insan yazıp çiziyor ama nafile. Bilim camiası hala p’yi çok seviyor, kocaman karmakarışık bir soruyu, yılların bilgi birikimi ve literatürünü tek bir sayıya indirgemek herkesi rahatlatıyor olmalı. En basitinden, p’nin kaçtan küçük olması sonucu bilimsel yapacak sorusunun cevabı yok. Eğer bir hipotez pek çok araştırmacı tarafından denenirse, eninde sonunda birileri p’yi yeterince küçük bulacak, nasıl olsa p’yi yeterince küçük bulamayan (istediği sonuca ulaşamamış) çalışmalar genellikle yayınlanmıyor. Ya da bazı kötü niyetli kişiler p değerini düşürmek için çeşitli numaralar çeviriyorsa?

Fakat bunlardan çok çok daha büyük bir sorun daha var. Aslında p değerinin ne olduğunu da bilmiyoruz. Yoldan geçenler değil, biliminsanları da bilmiyor. Çok muhterem bir biliminsanı Gigerenzer çalışmasında psikoloji alanında öğrenciler, hocalar ve istatistik dersini anlatan hocalara p değeriyle ilgili 6 tane yanlış önerme vermiş. Sonuçlar korkunç. En az bir tane yanlışa doğru diyenlerin oranı öğrencilerde %100, hocalarda %90, istatistik dersi anlatan hocalarda %80! Bilimsel yöntemi bu sefillikten kurtarmak zorundayız, yoksa aşı otizm yapıyor kafasıyla mücadele etmek tamamen imkansız hale gelecek.

Gigerenzer’in çalışmasında sorduğu sorulara gelelim. Yukarıdaki kanser çalışmasında olalım, p değerini hesapladık %1 çıktı. Aşağıdaki önermelerden hangileri doğrudur?

Elmanın kansere iyi gelmediği kesinlikle yanlıştır.
Elmanın kansere iyi gelmiyor olma ihtimali %1’dir.
Elmanın kansere iyi geldiği kesinlikle doğrudur.
Elmanın kansere iyi geliyor olması ihtimalini hesaplayabilirsiniz.
Elmanın kansere iyi gelmiyor olması ihtimali hesaplayabilirsiniz.
Aynı deneyi defalarca tekrar etseydiniz %99 ihtimalle aynı sonuca ulaşırdınız.

1 ve 3 tabii ki yanlış, çünkü kesin bir sonuç elde etmiyoruz. Diğerleri daha kandırmacalı. Yukarıda yazanları bir kere daha okursanız, p değerinin tam olarak “elma kansere iyi gelmiyorsa, veri setlerinin bu sonucu vermesi ihtimali” olduğunu göreceksiniz. Yanisi, p değeri doğrudan elmanın kansere iyi gelip gelmediğiyle ilgili herhangi bir olasılık ölçmüyor, ve önermelerin hepsi yanlış.

Bilim yayın yapmaya, iş bulmaya ve fon almaya programlanmış düzene bırakılamayacak kadar ciddi bir iştir, p değeri üzerinde ortaya saçılan sefillik bence bunu gösteriyor.

Thursday, July 23, 2020

The New Balance

When a population is not growing over a long period of time, and the population curve is flat, this must mean that each generation of new parents is the same size as the previous one. For thousands of years up to 1800 the population curve was almost flat. Have you heard people say that humans used to live in balance with nature? Well, yes, there was a balance. But let’s avoid the rose-tinted glasses. Until 1800, women gave birth to six children on average. So the population should have increased with each generation. Instead, it stayed more or less stable. Remember the child skeletons in the graveyards of the past? On average four out of six children died before becoming parents themselves, leaving just two surviving children to parent the next generation. There was a balance. It wasn’t because humans lived in balance with nature. Humans died in balance with nature. It was utterly brutal and tragic. Today, humanity is once again reaching a balance. The number of parents is no longer increasing. But this balance is dramatically different from the old balance. The new balance is nice: the typical parents have two children, and neither of them dies. For the first time in human history, we live in balance.

Factfullness

Every group of people I ask thinks the world is more frightening, more violent, and more hopeless—in short, more dramatic—than it really is.iti

Wednesday, March 20, 2019

Window and Scoreboard Metaphor to Understand Statistics

There are two kinds of people in life: those who like crude dualities and those who do not. And now that I’ve unmasked myself as the first, please allow me to introduce a statistical distinction I find helpful: windows vs. scoreboards.

A “window” is a number that offers a glimpse of reality. It does not feed into any incentive scheme. It cannot earn plaudits or incur punishments. It is a rough, partial, imperfect thing—yet still useful to the curious observer. Think of a psychologist asking a subject to rate his happiness on a scale from 1 to 10. This figure is just a crude simplification; only the most hopeless “1” would believe that the number is happiness.

Or imagine you’re a global health researcher. It’s not possible to quantify the physical and mental well-being of every human in a country. Instead, you look to summary statistics: life expectancy, childhood poverty, Pop-Tarts per capita. They’re not the whole reality, but they’re valuable windows into it.

The second kind of metric is a “scoreboard.” It reports a definite, final outcome. It is not a detached observation, but a summary judgment, an incentive scheme, carrying consequences.

Think of the score in a basketball game. Sure, bad teams sometimes beat good ones. But call the score a “flawed metric for team quality,” and people will shoot you the side-eye. You don’t score points to prove your team’s quality; you improve your team’s quality to score more points. The scoreboard isn’t a rough measure, but the desired result itself.

Or consider a salesperson’s total revenue. The higher this number, the better you’ve done your job. End of story.

A single statistic may serve as window or scoreboard, depending on who’s looking. As a teacher, I consider test scores to be windows. They gesture at the truth but can never capture the full scope of mathematical skills (flexibility, ingenuity, affection for “sine” puns, etc.). For students, however, tests are scoreboards. They’re not a noisy indicator of a nebulous long-term outcome. They are the outcome.

Many statistics make valuable windows but dysfunctional scoreboards. For example, heed the tale of British ambulances. In the late 1990s, the UK government instituted a clear metric: the percentage of “immediately life-threatening” calls that paramedics reached within eight minutes. The target: 75%.

Nice window. Awful scoreboard.

First, there was data fudging. Records showed loads of calls answered in seven minutes and 59 seconds; almost none in eight minutes and one second. Worse, it incentivized bizarre behavior. Some crews abandoned their ambulances altogether, riding bicycles through city traffic to meet the eight-minute target. I’d argue that a special-built patient-transporting truck in nine minutes is more useful than a bicycle in eight, but the scoreboard disagreed.

Tuesday, March 19, 2019

Replication of Previous Research

If a finding is true, then rerunning the experiment should generally yield the same outcome. If it’s false, then the result will vanish like a mirage.

Replication is slow, unglamorous work. It takes time and money while producing nothing new or innovative. But psychology knows the stakes and is beginning to face its demons. One high-profile project, published in 2015, performed careful replications of 100 psychological studies. The findings made headlines: 61 of the 100 failed to replicate.

In that grim news, I see progress. The research community is taking a sobering look in the mirror and owning up to the truth, as ugly as it may be. Now social psychologists hope that other fields, such as medicine, will follow their lead.

Science has never been defined by infallibility or superhuman perfection. It has always been about healthy skepticism, about putting every hypothesis to the test. In this struggle, is the field of statistics an essential ally. Yes, it has played a part in bringing science to the brink, but just as surely, it will play a part in bringing science back.

Not all 0.04s are created equal!

In 1925, a statistician named R. A. Fisher published a book called Statistical Methods for Research Workers. In it, he proposed a line in the sand: 0.05. In other words, let’s filter out 19 of every 20 flukes.

Why let through the other one in 20? Well, you can set the threshold lower than 5% if you like. Fisher himself was happy to consider 2% or 1%. But this drive to avoid false positives incurs a new risk: false negatives. The more flukes you weed out, the more true results get caught in the filter as well.

Suppose you’re studying whether men are taller than women. Hint: they are. But what if your sample is a little fluky? What if you happen to pick taller-than-typical women and shorter-than-typical men, yielding an average difference of just 1 or 2 inches? Then a strict p-value threshold may reject the result as a fluke, even though it’s quite genuine.

The number 0.05 represents a compromise, a middle ground between incarcerating the innocent and letting the guilty walk free.

For his part, Fisher never meant 0.05 as an ironclad rule. In his own career, he showed an impressive flexibility. Once, in a single paper, he smiled on a p-value of 0.089 (“some reason to suspect that the distribution… is not wholly fortuitous”) yet waved off one of 0.093 (“such association, if it exists, is not strong enough to show up significantly”).

To me, this makes sense. A foolish consistency is the hobgoblin of little statisticians. If you tell me that after-dinner mints cure bad breath (p = 0.04), I’m inclined to believe you. If you tell me that after-dinner mints cure osteoporosis (p = 0.04), I’m less persuaded. I admit that 4% is a low probability. But I judge it even less likely that science has, for decades, overlooked a powerful connection between skeletal health and Tic Tacs.

All new evidence must be weighed against existing knowledge. Not all 0.04s are created equal.

Thursday, February 28, 2019

Experimental Results

Scientists want true positives. They are known as “discoveries” and can win you things like Nobel Prizes, smooches from your romantic partner, and continued funding.

True negatives are less fun. They’re like thinking you’d tidied the house and done the laundry, only to realize that, nope, that was just in your head. You’d rather know the truth, but you wish it were otherwise.

By contrast, false negatives are haunting. They’re like looking for your lost keys in the right place but somehow not seeing them. You’ll never know how close you were.

Last is the scariest category of all: false positives. They are, in a word, “flukes,” falsehoods that, on a good hair day, pass for truths. They wreak havoc on science, sitting undetected in the research literature for years and spawning waste-of-time follow-ups. In science’s never-ending quest for truth, it’s impossible to avoid false positives altogether—but it’s crucial to keep them to a minimum.

That’s where the p-value comes in. Its whole purpose is to filter out flukes.

Statistics

A statistic is an imperfect witness. It tells the truth, but never the whole truth.

Thursday, December 13, 2018

The Importance of Rice

Like wheat, rice belongs to the grass family, the Poaceae, and it looks similarly unpromising as a food – yet it’s become one of the most important cereals feeding our huge global population. Rice contributes around a fifth of the calories and around an eighth of the total protein consumed worldwide. Some 740 million tons of rice are produced each year, and it’s grown on every continent except Antarctica, and although it’s also becoming an increasingly important staple in both sub-Saharan Africa and Latin America, around 90 per cent of the world’s rice is grown and eaten in Asia. More than 3.5 billion people across the globe depend on rice as a staple, and it’s the most important food crop in low- and lower-middle-income countries. For the poorest 20 per cent of the tropical population around the world, rice provides more protein per person than beans, meat or milk.

Saturday, December 8, 2018

Probabilistic Thinking

Probability is everywhere, down to the very bones of the world. The probabilistic machinery in our minds—the cut-to-the-quick heuristics made so famous by the psychologists Daniel Kahneman and Amos Tversky—was evolved by the human species in a time before computers, factories, traffic, middle managers, and the stock market. It served us in a time when human life was about survival, and still serves us well in that capacity.

But what about today—a time when, for most of us, survival is not so much the issue? We want to thrive. We want to compete, and win. Mostly, we want to make good decisions in complex social systems that were not part of the world in which our brains evolved their (quite rational) heuristics.

For this, we need to consciously add in a needed layer of probability awareness. What is it and how can I use it to my advantage?

There are three important aspects of probability that we need to explain so you can integrate them into your thinking to get into the ballpark and improve your chances of catching the ball:

Bayesian thinking,
Fat-tailed curves
Asymmetries

Thomas Bayes and Bayesian thinking: Bayes was an English minister in the first half of the 18th century, whose most famous work, “An Essay Toward Solving a Problem in the Doctrine of Chances” was brought to the attention of the Royal Society by his friend Richard Price in 1763—two years after his death. The essay, the key to what we now know as Bayes’s Theorem, concerned how we should adjust probabilities when we encounter new data.

The core of Bayesian thinking (or Bayesian updating, as it can be called) is this: given that we have limited but useful information about the world, and are constantly encountering new information, we should probably take into account what we already know when we learn something new. As much of it as possible. Bayesian thinking allows us to use all relevant prior information in making decisions. Statisticians might call it a base rate, taking in outside information about past situations like the one you’re in.

Consider the headline “Violent Stabbings on the Rise.” Without Bayesian thinking, you might become genuinely afraid because your chances of being a victim of assault or murder is higher than it was a few months ago. But a Bayesian approach will have you putting this information into the context of what you already know about violent crime.

You know that violent crime has been declining to its lowest rates in decades. Your city is safer now than it has been since this measurement started. Let’s say your chance of being a victim of a stabbing last year was one in 10,000, or 0.01%. The article states, with accuracy, that violent crime has doubled. It is now two in 10,000, or 0.02%. Is that worth being terribly worried about? The prior information here is key. When we factor it in, we realize that our safety has not really been compromised.

Conversely, if we look at the diabetes statistics in the United States, our application of prior knowledge would lead us to a different conclusion. Here, a Bayesian analysis indicates you should be concerned. In 1958, 0.93% of the population was diagnosed with diabetes. In 2015 it was 7.4%. When you look at the intervening years, the climb in diabetes diagnosis is steady, not a spike. So the prior relevant data, or priors, indicate a trend that is worrisome.

It is important to remember that priors themselves are probability estimates. For each bit of prior knowledge, you are not putting it in a binary structure, saying it is true or not. You’re assigning it a probability of being true. Therefore, you can’t let your priors get in the way of processing new knowledge. In Bayesian terms, this is called the likelihood ratio or the Bayes factor. Any new information you encounter that challenges a prior simply means that the probability of that prior being true may be reduced. Eventually, some priors are replaced completely. This is an ongoing cycle of challenging and validating what you believe you know. When making uncertain decisions, it’s nearly always a mistake not to ask: What are the relevant priors? What might I already know that I can use to better understand the reality of the situation?

Now we need to look at fat-tailed curves: Many of us are familiar with the bell curve, that nice, symmetrical wave that captures the relative frequency of so many things from height to exam scores. The bell curve is great because it’s easy to understand and easy to use. Its technical name is “normal distribution.” If we know we are in a bell curve situation, we can quickly identify our parameters and plan for the most likely outcomes.

Fat-tailed curves are different. Take a look.

At first glance they seem similar enough. Common outcomes cluster together, creating a wave. The difference is in the tails. In a bell curve the extremes are predictable. There can only be so much deviation from the mean. In a fat-tailed curve there is no real cap on extreme events.

The more extreme events that are possible, the longer the tails of the curve get. Any one extreme event is still unlikely, but the sheer number of options means that we can’t rely on the most common outcomes as representing the average. The more extreme events that are possible, the higher the probability that one of them will occur. Crazy things are definitely going to happen, and we have no way of identifying when.

Think of it this way. In a bell curve type of situation, like displaying the distribution of height or weight in a human population, there are outliers on the spectrum of possibility, but the outliers have a fairly well defined scope. You’ll never meet a man who is ten times the size of an average man. But in a curve with fat tails, like wealth, the central tendency does not work the same way. You may regularly meet people who are ten, 100, or 10,000 times wealthier than the average person. That is a very different type of world.

Let’s re-approach the example of the risks of violence we discussed in relation to Bayesian thinking. Suppose you hear that you had a greater risk of slipping on the stairs and cracking your head open than being killed by a terrorist. The statistics, the priors, seem to back it up: 1,000 people slipped on the stairs and died last year in your country and only 500 died of terrorism. Should you be more worried about stairs or terror events?

Some use examples like these to prove that terror risk is low—since the recent past shows very few deaths, why worry? The problem is in the fat tails: The risk of terror violence is more like wealth, while stair-slipping deaths are more like height and weight. In the next ten years, how many events are possible? How fat is the tail?

The important thing is not to sit down and imagine every possible scenario in the tail (by definition, it is impossible) but to deal with fat-tailed domains in the correct way: by positioning ourselves to survive or even benefit from the wildly unpredictable future, by being the only ones thinking correctly and planning for a world we don’t fully understand.

Asymmetries: Finally, you need to think about something we might call “metaprobability” —the probability that your probability estimates themselves are any good.

This massively misunderstood concept has to do with asymmetries. If you look at nicely polished stock pitches made by professional investors, nearly every time an idea is presented, the investor looks their audience in the eye and states they think they’re going to achieve a rate of return of 20% to 40% per annum, if not higher. Yet exceedingly few of them ever attain that mark, and it’s not because they don’t have any winners. It’s because they get so many so wrong. They consistently overestimate their confidence in their probabilistic estimates. (For reference, the general stock market has returned no more than 7% to 8% per annum in the United States over a long period, before fees.)

Another common asymmetry is people’s ability to estimate the effect of traffic on travel time. How often do you leave “on time” and arrive 20% early? Almost never? How often do you leave “on time” and arrive 20% late? All the time? Exactly. Your estimation errors are asymmetric, skewing in a single direction. This is often the case with probabilistic decision-making.

Far more probability estimates are wrong on the “over-optimistic” side than the “under-optimistic” side. You’ll rarely read about an investor who aimed for 25% annual return rates who subsequently earned 40% over a long period of time. You can throw a dart at the Wall Street Journal and hit the names of lots of investors who aim for 25% per annum with each investment and end up closer to 10%.

This article was originally published at Farnam Street Blog.

Monday, July 16, 2018

Math, Probabilities and Football

Marco Altamirano, Nautilus

This year’s World Cup has been full of surprises. Tournament mainstays such as the Netherlands and Italy didn’t even qualify, and Germany, the reigning world champions, finished last in their group after upsets by Mexico and South Korea. Statisticians favored powerhouses Spain and Argentina to drive into late stages of the tournament, only to see them lose to sleepers like Russia and Croatia.

Yet what this World Cup reveals isn’t that the stats were wrong—far from it, they were insightfully calculated—but rather that we relate to stats and probabilities in strange ways. Most fans, for example, enthusiastically bring up “x factors” and players who are “on fire,” while stat-wielding commentators coolly remind them that what appears to be a hot run is actually statistically regular and that a victory for the underdog remains forbiddingly unlikely. But then the whistle blows and the bizarre alchemy of the world takes over. Suddenly, a typically underwhelming team like Mexico starts to dazzle and, sensing an advantage, topples a giant.

To be sure, soccer is a sport that notoriously resists predictions. The batting averages and shooting percentages in baseball and basketball are far more reliable stats than anything in soccer for divining contest results, perhaps because the collective performance of a soccer team, as opposed to a baseball or basketball team, greatly outweighs any individual contribution from its players. This isn’t to say that probabilities in soccer are unreliable; it’s just that these probabilities apply better to classes of outcomes, like a set of coin tosses, than to this or that particular outcome, like whether the next coin flip will be heads or tails.

We say, for example, that there’s a 50 percent chance of a coin landing heads or tails. But technically, all this probability states is that in the immense class of coin-toss events, given enough tries, and all other things being equal, the coin will land heads half the time. Nevertheless, the coin could land heads 99 times in a row out of 100, and you can even expect that to happen, given enough tries.

Of course, sports matches are considerably more complex than coin-tosses. There’s no need to develop statistics about how different coin-tossers performed against others in the history of coin-tossing to calculate the probability of a coin landing tails. But the underlying nature of the probability remains the same—equally weighted teams should win about half the time against each other, and teams that have performed well against other teams in the past should, in general, perform well against them in the future.

This explains the difference in perspective between the fan and the statistician. The statistician is interested in the grand scheme of events, where “x factors” and “hot runs” simply become a part of an average, whereas the fan is interested in the unlikely series of flourishes that might make this particular match exceptional. These two conflicting perspectives are part of what makes sports matches such passionate events. We know what to expect, but we also know something else is possible, so we hope, despite the odds.

So we can, with confidence, expect Germany to beat South Korea and Spain to beat Russia, but only most of the time. However, for better or worse, “most of the time” is patently not “today” or even “this World Cup.” And this makes the upcoming match between France, the 1998 World Champions, and Croatia, first-time finalists and the second smallest nation to ever reach the championship match, all the more thrilling.

This piece is published in Nautilus as Our Strange Relationship to World Cup Probabilities on July 13, 2018.