Nassim Taleb’s “The Black Swan: The Impact of the Highly Improbable” is a long but great description of the common pitfalls when reasoning about risks and probabilities in the real world. A must-read for all data scientists, statisticians, and decision makers.
The book focuses on a very important issue and provides a popular term that is commonly used now. Central to the book is the concept of the “black swan”, a highly improbable event with a big impact. The background story for this is the knowledge of actual black swans. Black swans exist in the real world, for example in Australia. However, people who lived in Europe had never seen a black swan until 1697 when an expedition saw them in Australia. For thousands of years, Europeans had seen only white swans. This is clearly not evidence that no black swan exists, which they did all along. However, the book argues that this incorrect conclusion is often drawn in many situations. Many similar entertaining stories are provided to reinforce this, such as the farm animal that is more certain than ever that it will be provided with food and shelter forever the day before its slaughter.
Two more concepts from the book are Extremistan and Mediocristan, which represent situations where extreme and unlikely events matter (Extremistan) and situations where probabilities even out (Mediocristan). An example from Mediocristan is the average height of humans. If you sample 100 individuals and calculate their average height, it will be relatively close to the population average. If you do the same for wealth, the result depends a lot on the number of billionaires among the samples. The difference is that unlikely events (you happen to sample a billionaire) can have a large impact on the overall result when the impact of a single sample is large or even unbounded.
In other words, in some situations there are no “black swans” because each single event or sample can’t impact the big picture much. In these situations, highly improbable events can be ignored and risks can be managed by averaging and estimating probabilities based on collected statistics.
The more interesting situations are those where black swans have a big impact. Here, risks and opportunities can’t be calculated accurately or made irrelevant by averaging over many instances. Past data is of limited use, since highly improbable events require impractical amounts of data to measure. This is very common in the real world, but it’s also popular to assume the opposite to simplify and therefore produce phony statistics about risks that are inaccurate to the point of being useless. It’s like focusing on installing an unbreakable front door and leaving the window open. This misses the big picture by looking at only the aspects that are neat and easy to reason about. This is made worse by the human tendency to make up explanations and causes after the fact, even where nobody could predict an event beforehand.
The world is full of black swan events that impacted history hugely. Facebook overtook My Space. Google replaced Altavista as the largest search engine. Microsoft and IBM have both lost their market-leading positions (while not disappearing). The Soviet Union and the Berlin Wall fell. Every single war and financial bubble has been mostly unpredictable before they happened. The German army went around the Maginot Line in World War II (which we must assume was less obvious at the time). Harry Potter became the best-selling book series in history and therefore most of us know about it now. All of these events likely wouldn’t even have happened if they had been easily predicted.
What to do in an unpredictable world then? One insight the book provides is that we can often predict the existence of potential of black swans without being able to predict the next black swan or its probability. This can be useful in two ways. Whenever you can (cheaply) get exposure to potential positive black swans, take the chance! In these situations, you can often pay a low cost for a small chance of a huge payoff that will easily give you a positive expected value. On the other hand, when there’s a risk for negative black swans, be very wary of people making assumptions that exclude this (such as assuming normal distributions). Instead, try to limit the impact of potential black swans rather than assuming they won’t happen.
One area that this affects are career choices. Some careers are scalable, such as writing books. A best-selling book can sell millions of copies without any extra work. Unfortunately, best-selling books and authors are very much black swans. Nobody can predict which few authors will sell almost all books. The other side of this is that most who invest in trying to become a best-selling author won’t succeed. It’s winner takes it all. On the other hand, there are many less scalable career choices such as selling your time for work that needs to be done repeatedly. This way, you can’t get super rich any particular day but your income will be much more predictable. One way to get some of both worlds is to mostly focus on the predictable but also invest a little in highly speculative bets.
Finally, all is not gloomy when it comes to black swans. It’s also egalitarian in a way, as the book points out. Luck and the highly improbable is exactly why the rich and powerful don’t get richer and more powerful forever and why the companies that many believe will be biggest forever crumble and are replaced by newcomers every now and then. Unlikely events and luck are to some extent accessible to everyone. What’s important is that we don’t forget about them and fool ourselves that we can calculate risks neatly by assuming that black swan events won’t ever matter, simply because that would be convenient.