Categories
Uncategorized

Notes on Tetlock and Gardner’s Superforecasting

Notes on Tetlock and Gardner’s Superforecasting: The Art and Science of Prediction.

Good book. A bit too journalisty for me at points but overall a very good balance of academic and popular. As is often the case, you could probably distill the majority of it down to ten pages (plus appendix on psychology). At the same time, the authors walk through complex ideas, skilfully illustrate the ideas with anecdotes and stories and retain the nuance of the underlying subject matter.

The most valuable part of the book for me and Life Itself is its concrete analytical insights into open-mindedness and good judgment, especially on the psychology/being side (as opposed to, say, numeracy—though even numeracy is interpreted into its probabilistic and then psychological aspects—see below re. “fate questionnaire”).

Key takeaways

  • There are people who are (very) good forecasters regarding political and world events. These are the “superforecasters” of the book’s title. But they aren’t the political “experts” you see on TV or in the newspapers.
    • These “superforecasters” are statistically well above average. If an ordinary predictor has 20/20 vision, then some superforecasters have 100/20 vision.
    • These superforecasters don’t have immediately-distinctive traits: they are ordinary people, though often pretty smart and curious. Most of the book is dedicated to examining what makes them good and to what extent training makes a difference.
    • This book follows up, complements and contrasts with Tetlock’s previous and more academic book, Expert Political Judgment: How Good Is It? How Can We Know? (2005), which found that most political experts did not perform that much better than chance – “monkeys throwing darts” (though, to be fair, quite sophisticated monkeys).
  • The book is based on a recent, large-scale research study by Tetlock that was sponsored by IARPA (Intelligence Advanced Research Projects Agency – like DARPA but for CIA). They recruited large numbers of ordinary people and then tried them out on lots of predictions. They also looked at whether training or organising participants into teams helped or not.
  • The rest of the book looks into what makes these superforecasters good. Rough answers:
    • Reasonably smart, with decent basic statistical skills: understand and apply base rates, doing rough Bayesian updating.
    • Do lots of updating: people who were good updated a lot (but not too much).
    • Good psych profile: open-minded, curious, etc.

Chapter Overviews

  1. An Optimistic Skeptic: sets out original skeptic background from reception of Expert Political Judgment (2005) and explains how Tetlock always had a more nuanced view (there were some exceptional people even in the original sample). Foxes vs hedgehogs.
  2. Illusions of Knowledge: why we think we are better than we are and the flaws of System 1.
  3. Keeping Score: how do you track predictive accuracy and what does it mean to be good at forecasting? Introduction to calibration vs discrimination plus the “Brier Score”, which is quadratic error. Brier score = Sum of square error between prediction probability and actual outcome (e.g. if predict rain with probability 60% and it rains then score is 0.16 (1-0.6 squared) and if it does not my score is 0.36 (0-0.4 squared).
  4. Superforecasters (SFs): defining who they are. They are people who are exceptionally good at forecasting – with ‘exceptionally good’ being statistically definable.
  5. Supersmart: are they super intelligent? No, but they are generally reasonably smart.
  6. Superquants: are the SFs just math geniuses? No, but they are all numerate and they have a good understanding of basic probability, including base rates etc. (something most of us don’t have).
  7. Supernewsjunkies: are SFs just good because they consume lots of information? Yes and no. It’s the quality and variety of what they consume; many of the good forecasters did not spend that much time reading material.
  8. Perpetual Beta: good forecasters keep updating (with new info) and questioning their forecasts.
  9. Superteams: does putting people in teams help and how do they function? Does averaging scores help (or “extremising”)? Answer: yes, teams help, especially in the case of certain superforecasters. Team dynamics matter, as does having someone who coordinates and manages well. Extremising helps a lot and is most valuable for teams that are less aligned.
  10. The Leader’s Dilemma: the qualities of good forecasters–“foxy”, not too confident, open to other options, etc.–which contrasts with the supposed desirable qualities of leaders who should be decisive, bold, confident etc. Plus, you need to allow the teams to self-organize. You can combine the strong pursuit of what you currently think with constant open-mindedness to being wrong (though this can’t be easy!). Moreover, decentralized, delegated leadership, flattish management etc. work. They use the nice example of the German military pre-WWII as a great example of an organization where leadership was delegated down the hierarchy and initiative expected: “The command principle of…Auftragstaktik blended strategic coherence and decentralized decision making with a simple principle: commanders were to tell subordinates what their goal is but not how to achieve it”
  11. Are They Really So Super?
  12. What’s Next?
  13. Epilogue

The chapters that stood out as most useful are discussed in more detail below.

Chapter 3 – Expert Political Judgment

In the mid-1980s Tetlock began a research programme to learn what sets the best forecasters apart. He recruited experts whose livelihoods involved analyzing political and economic trends and events. The experts made a total of roughly twenty-eight thousand predictions between them. The final results appeared in 2005.

If you didn’t know the punch line of EPJ before you read this book, you do now: the average expert was roughly as accurate as a dart-throwing chimpanzee. But as students are warned in introductory statistics classes, averages can obscure.

[…]

In the EPJ results, there were two statistically distinguishable groups of experts. The first failed to do better than random guessing, and in their longer-range forecasts even managed to lose to the chimp. The second group beat the chimp, though not by a wide margin, and they still had plenty of reason to be humble. Indeed, they only barely beat simple algorithms like “always predict no change” or “predict the recent rate of change.” Still, however modest their foresight was, they had some.

So why did one group do better than the other? It wasn’t whether they had PhDs or access to classified information. Nor was it what they thought—whether they were liberals or conservatives, optimists or pessimists. The critical factor was how they thought.

One group tended to organize their thinking around Big Ideas, although they didn’t agree on which Big Ideas were true or false. Some were environmental doomsters (“We’re running out of everything”); others were cornucopian boomsters (“We can find cost-effective substitutes for everything”). Some were socialists (who favored state control of the commanding heights of the economy); others were free-market fundamentalists (who wanted to minimize regulation). As ideologically diverse as they were, they were united by the fact that their thinking was so ideological. They sought to squeeze complex problems into the preferred cause-effect templates and treated what did not fit as irrelevant distractions. Allergic to wishy-washy answers, they kept pushing their analyses to the limit (and then some), using terms like “furthermore” and “moreover” while piling up reasons why they were right and others wrong. As a result, they were unusually confident and likelier to declare things “impossible” or “certain.” Committed to their conclusions, they were reluctant to change their minds even when their predictions clearly failed. They would tell us, “Just wait.”

The other group consisted of more pragmatic experts who drew on many analytical tools, with the choice of tool hinging on the particular problem they faced. These experts gathered as much information from as many sources as they could. When thinking, they often shifted mental gears, sprinkling their speech with transition markers such as “however,” “but,” “although,” and “on the other hand.” They talked about possibilities and probabilities, not certainties. And while no one likes to say “I was wrong,” these experts more readily admitted it and changed their minds.

[…]

I dubbed the Big Idea experts “hedgehogs” and the more eclectic experts “foxes.”

Foxes beat hedgehogs. And the foxes didn’t just win by acting like chickens, playing it safe with 60% and 70% forecasts where hedgehogs boldly went with 90% and 100%. Foxes beat hedgehogs on both calibration and resolution. Foxes had real foresight. Hedgehogs didn’t.

[…]

Now look at how foxes approach forecasting. They deploy not one analytical idea but many and seek out information not from one source but many. Then they synthesize it all into a single conclusion. In a word, they aggregate. They may be individuals working alone, but what they do is, in principle, no different from what Galton’s crowd did. They integrate perspectives and the information contained within them. The only real difference is that the process occurs within one skull.

Chapter 5 – Supersmart?

Superforecasters seek diverse ideas and challenges to their beliefs.

But ultimately, as with intelligence, this has less to do with traits someone possesses and more to do with behavior. A brilliant puzzle solver may have the raw material for forecasting, but if he doesn’t also have an appetite for questioning basic, emotionally-charged beliefs, he will often be at a disadvantage relative to a less intelligent person who has a greater capacity for self-critical thinking. It’s not your raw crunching power that matters most. It’s what you do with it.

Look at Doug Lorch. His natural inclination is obvious. But he doesn’t assume it will see him through. He cultivates it. Doug knows that when people read for pleasure they naturally gravitate to the like-minded. So he created a database containing hundreds of information sources—from the New York Times to obscure blogs—that are tagged by their ideological orientation, subject matter, and geographical origin, then wrote a program that selects what he should read next using criteria that emphasize diversity. Thanks to Doug’s simple invention, he is sure to constantly encounter different perspectives. Doug is not merely open-minded. He is actively open-minded. Active open-mindedness (AOM) is a concept coined by the psychologist Jonathan Baron, who has an office next to mine at the University of Pennsylvania. Baron’s test for AOM asks whether you agree or disagree with statements like:

– People should take into consideration evidence that goes against their beliefs.

– It is more useful to pay attention to those who disagree with you than to pay attention to those who agree.

– Changing your mind is a sign of weakness.

– Intuition is the best guide in making decisions.

– It is important to persevere in your beliefs even when evidence is brought to bear against them.

Quite predictably, superforecasters score highly on Baron’s test. But more importantly, superforecasters illustrate the concept. They walk the talk.

Chapter 6 – Superquants?

Superforecasters think probabilistically, not fatalistically.

A probabilistic thinker will be less distracted by “why” questions and focus on “how.” This is no semantic quibble. “Why?” directs us to metaphysics; “How?” sticks with physics. The probabilistic thinker would say, “Yes, it was extremely improbable that I would meet my partner that night, but I had to be somewhere and she had to be somewhere and happily for us our somewheres coincided.” The economist and Nobel laureate Robert Shiller tells the story of how Henry Ford decided to hire workers at the then-astonishingly high rate of $5 a day, which convinced both his grandfathers to move to Detroit to work for Ford. If someone had made one of his grandfathers a better job offer, if one of his grandfathers had been kicked in the head by a horse, if someone had convinced Ford he was crazy to pay $5 a day…if an almost infinite number of events had turned out differently, Robert Shiller would not have been born. But rather than see fate in his improbable existence, Shiller repeats the story as an illustration of how radically indeterminate the future is. “You tend to believe that history played out in a logical sort of sense, that people ought to have foreseen, but it’s not like that,” he told me. “It’s an illusion of hindsight.”

Even in the face of tragedy, the probabilistic thinker will say, “Yes, there was an almost infinite number of paths that events could have taken, and it was incredibly unlikely that events would take the path that ended in my child’s death. But they had to take a path and that’s the one they took. That’s all there is to it.” In Kahneman’s terms, probabilistic thinkers take the outside view toward even profoundly identity-defining events, seeing them as quasi-random draws from distributions of once-possible worlds.

Or, in Kurt Vonnegut’s terms, “Why me? Why not me?”

If it’s true that probabilistic thinking is essential to accurate forecasting, and it-was-meant-to-happen thinking undermines probabilistic thinking, we should expect superforecasters to be much less inclined to see things as fated. To test this, we probed their reactions to pro-fate statements like these:

– Events unfold according to God’s plan.

– Everything happens for a reason.

– There are no accidents or coincidences.

We also asked them about pro-probability statements like these:

Nothing is inevitable

Even major events like World War II or 9/11 could have turned out very differently.

Randomness is often a factor in our personal lives.

We put the same questions to regular volunteer forecasters, undergraduates at the University of Pennsylvania, and a broad cross section of adult Americans. On a 9-point “fate score,” where 1 is total rejection of it-was-meant-to-happen thinking and 9 is a complete embrace of it, the mean score of adult Americans fell in the middle of the scale. The Penn undergrads were a little lower. The regular forecasters were a little lower still. And the superforecasters got the lowest score of all, firmly on the rejection-of-fate side.

For both the superforecasters and the regulars, we also compared individual fate scores with Brier scores and found a significant correlation—meaning the more a forecaster inclined toward it-was-meant-to-happen thinking, the less accurate her forecasts were. Or, put more positively, the more a forecaster embraced probabilistic thinking, the more accurate she was.

So finding meaning in events is positively correlated with well-being but negatively correlated with foresight. That sets up a depressing possibility: Is misery the price of accuracy?

I don’t know. But this book is not about how to be happy. It’s about how to be accurate, and the superforecasters show that probabilistic thinking is essential for that. I’ll leave the existential issues to others.

Chapter 7 – Supernewsjunkies?

The beliefs which are connected to our egos and identities are the most difficult to change in light of contradictory evidence.

But not all disturbances are equal. Remember that Keynes quotation about changing your mind in light of changed facts? It’s cited in countless books, including one written by me and another by my coauthor. Google it and you will find it’s all over the Internet. Of the many famous things Keynes said it’s probably the most famous. But while researching this book, I tried to track it to its source and failed. Instead, I found a post by a Wall Street Journal blogger, which said that no one has ever discovered its provenance and the two leading experts on Keynes think it is apocryphal. In light of these facts, and in the spirit of what Keynes apparently never said, I concluded that I was wrong. And I have now confessed to the world. Was that hard? Not really. Many smart people made the same mistake, so it’s not embarrassing to own up to it. The quotation wasn’t central to my work and being right about it wasn’t part of my identity.

But if I had staked my career on that quotation, my reaction might have been less casual. Social psychologists have long known that getting people to publicly commit to a belief is a great way to freeze it in place, making it resistant to change. The stronger the commitment, the greater the resistance.

Jean-Pierre Beugoms is a superforecaster who prides himself on his willingness “to change my opinions a lot faster than my other teammates,” but he also noted “it is a challenge, I’ll admit that, especially if it’s a question that I have a certain investment in.” For Beugoms, that means military questions. He is a graduate of West Point who is writing his PhD dissertation on American military history. “I feel like I should be doing better than most [on military questions]. So if I realize that I’m wrong, I might spend a few days in denial about it before I critique myself.”

Commitment can come in many forms, but a useful way to think of it is to visualize the children’s game Jenga, which starts with building blocks stacked one on top of another to form a little tower. Players take turns removing building blocks until someone removes the block that topples the tower. Our beliefs about ourselves and the world are built on each other in a Jenga-like fashion. My belief that Keynes said “When the facts change, I change my mind” was a block sitting at the apex. It supported nothing else, so I could easily pick it up and toss it without disturbing other blocks. But when Jean-Pierre makes a forecast in his specialty, that block is lower in the structure, sitting next to a block of self-perception, near the tower’s core. So it’s a lot harder to pull that block out without upsetting other blocks— which makes Jean-Pierre reluctant to tamper with it.

The Yale professor Dan Kahan has done much research showing that our judgments about risks—Does gun control make us safer or put us in danger?—are driven less by a careful weighing of evidence than by our identities, which is why people’s views on gun control often correlate with their views on climate change, even though the two issues have no logical connection to each other. Psycho-logic trumps logic. And when Kahan asks people who feel strongly that gun control increases risk, or diminishes it, to imagine conclusive evidence that shows they are wrong, and then asks if they would change their position if that evidence were handed to them, they typically say no. That belief block is holding up a lot of others. Take it out and you risk chaos, so many people refuse to even imagine it.

When a block is at the very base of the tower, there’s no way to remove it without bringing everything crashing down. This extreme commitment leads to extreme reluctance to admit error, which explains why the men responsible for imprisoning 112,000 innocent people could be so dogged in their belief that the threat of sabotage was severe. Their commitment was massive. Warren was, deep down, a civil libertarian. Admitting to himself that he had unjustly imprisoned 112,000 people would have taken a sledgehammer to his mental tower.

This suggests that superforecasters may have a surprising advantage: they’re not experts or professionals, so they have little ego invested in each forecast. Except in rare circumstances—when Jean-Pierre Beugoms answers military questions, for example—they aren’t deeply committed to their judgments, which makes it easier to admit when a forecast is offtrack and adjust. This isn’t to say that superforecasters have zero ego investment. They care about their reputations among their teammates. And if “superforecaster” becomes part of their self-concept, their commitment will grow fast. But still, the self-esteem stakes are far less than those for career CIA analysts or acclaimed pundits with their reputations on the line. And that helps them avoid underreaction when new evidence calls for updating beliefs.

Chapter 8 – Perpetual Beta

We have learned a lot about superforecasters, from their lives to their test scores to their work habits. Taking stock, we can now sketch a rough composite portrait of the modal superforecaster.

In philosophic outlook, they tend to be:

CAUTIOUS: Nothing is certain

HUMBLE: Reality is infinitely complex

NONDETERMINISTIC: What happens is not meant to be and does not have to happen

In their abilities and thinking styles, they tend to be:

ACTIVELY OPEN-MINDED: Beliefs are hypotheses to be tested, not treasures to be protected

INTELLIGENT AND KNOWLEDGEABLE, WITH A “NEED FOR COGNITION”: Intellectually curious, enjoy puzzles and mental challenges

REFLECTIVE: Introspective and self-critical

NUMERATE: Comfortable with numbers

In their methods of forecasting they tend to be:

PRAGMATIC: Not wedded to any idea or agenda

ANALYTICAL: Capable of stepping back from the tip-of-your-nose perspective and considering other views

DRAGONFLY-EYED: Value diverse views and synthesize them into their own

PROBABILISTIC: Judge using many grades of maybe

THOUGHTFUL UPDATERS: When facts change, they change their minds

GOOD INTUITIVE PSYCHOLOGISTS: Aware of the value of checking thinking for cognitive and emotional biases

In their work ethic, they tend to have:

A GROWTH MINDSET: Believe it’s possible to get better

GRIT: Determined to keep at it however long it takes

I paint with a broad brush here. Not every attribute is equally important. The strongest predictor of rising into the ranks of superforecasters is perpetual beta, the degree to which one is committed to belief updating and self-improvement. It is roughly three times as powerful a predictor as its closest rival, intelligence. To paraphrase Thomas Edison, superforecasting appears to be roughly 75% perspiration, 25% inspiration.

And not every superforecaster has every attribute. There are many paths to success and many ways to compensate for a deficit in one area with strength in another. The predictive power of perpetual beta does suggest, though, that no matter how high one’s IQ it is difficult to compensate for lack of dedication to the personal project of “growing one’s synapses.”

All that said, there is another element that is missing entirely from the sketch: other people. In our private lives and our workplaces, we seldom make judgments about the future entirely in isolation. We are a social species. We decide together. This raises an important question.

What happens when superforecasters work in groups?

Chapter 9 – Superteams

At the end of the year, the results were unequivocal: on average, teams were 23% more accurate than individuals.

Teams created a culture of constructive criticism.

“There was a lot of what I’ll call dancing around,” recalled Marty Rosenthal of his first year on a team. People would disagree with someone’s assessment, and want to test it, but they were too afraid of giving offense to just come out and say what they were thinking. So they would “couch it in all these careful words,” circling around, hoping the point would be made without their having to make it.

Experience helped. Seeing this “dancing around,” people realized that excessive politeness was hindering the critical examination of views, so they made special efforts to assure others that criticism was welcome. “Everybody has said, ‘I want push-back from you if you see something I don’t,’” said Rosenthal. That made a difference. So did offering thanks for constructive criticism. Gradually, the dancing around diminished.

The teams were each comprised of 12 superforecasters, with a nucleus of members who did most of the work.

Most teams have a nucleus of five or six members who do most of the work. Within that core, we might expect to see a division of labor that reduces the amount of effort any one person needs to invest in the task, at least if he or she approached forecasting as work, not play. But we saw the opposite on the best teams: workloads were divided, but as commitment grew, so did the amount of effort forecasters put into it. Being on the team was “tons more work,” Elaine said. But she didn’t mind. She found it far more stimulating than working by herself. “You could be supporting each other, or helping each other, or building on ideas,” she said. “It was a rush.”

Superteams outperformed prediction markets under experimental conditions.

We put that proposition to the test by randomly assigning regular forecasters to one of three experimental conditions. Some worked alone. Others worked in teams. And some were traders in prediction markets run by companies such as Inkling and Lumenogic. Of course, after year 1—when the value of teams was resoundingly demonstrated—nobody expected forecasters working alone to compete at the level of teams or prediction markets, so we combined all their forecasts and calculated the unweighted average to get the “wisdom of the crowd.” And of course we had one more competitor: superteams.

The results were clear-cut each year. Teams of ordinary forecasters beat the wisdom of the crowd by about 10%. Prediction markets beat ordinary teams by about 20%. And superteams beat prediction markets by 15% to 30%.

I can already hear the protests from my colleagues in finance that the only reason the superteams beat the prediction markets was that our markets lacked liquidity: real money wasn’t at stake and we didn’t have a critical mass of traders. They may be right. It is a testable idea, and one worth testing. It’s also important to recognize that while superteams beat prediction markets, prediction markets did a pretty good job of forecasting complex global events.

How did superteams do so well? By avoiding the extremes of groupthink and Internet flame wars. And by fostering minicultures that encouraged people to challenge each other respectfully, admit ignorance, and request help. In key ways, superteams resembled the best surgical teams identified by Harvard’s Amy Edmondson, in which the nurse doesn’t hesitate to tell the surgeon he left a sponge behind the pancreas because she knows it is “psychologically safe” to correct higher-ups. Edmondson’s best teams had a shared purpose. So did our superteams. One sign of that was linguistic: they said “our” more than “my.”

A team like that should promote the sort of actively open-minded thinking that is so critical to accurate forecasting, as we saw in chapter 5. So just as we surveyed individuals to test their active open-mindedness (AOM), we surveyed teams to probe their attitudes toward the group and patterns of interaction within the group—that is, we tested the team’s AOM. As expected, we found a correlation between a team’s AOM and its accuracy. Little surprise there. But what makes a team more or less actively open-minded? You might think it’s the individuals on the team. Put high AOM people in a team and you’ll get a high-AOM team; put lower-AOM people in a team and you’ll get a lower-AOM team. Not so, as it turns out. Teams were not merely the sum of their parts. How the group thinks collectively is an emergent property of the group itself, a property of communication patterns among group members, not just the thought processes inside each member. A group of open-minded people who don’t care about one another will be less than the sum of its open-minded parts. A group of opinionated people who engage one another in pursuit of the truth will be more than the sum of its opinionated parts [emphasis added].

Winning teams fostered a culture of sharing.

All this brings us to the final feature of winning teams: the fostering of a culture of sharing. My Wharton colleague Adam Grant categorizes people as “givers,” “matchers,” and “takers.” Givers are those who contribute more to others than they receive in return; matchers give as much as they get; takers give less than they take. Cynics might say that giver is a polite word for chump. After all, anyone inclined to freeload will happily take what they give and return nothing, leaving the giver worse off than if he weren’t so generous. But Grant’s research shows that the prosocial example of the giver can improve the behavior of others, which helps everyone, including the giver—which explains why Grant has found that givers tend to come out on top.

Marty Rosenthal is a giver. He wasn’t indiscriminately generous with his time and effort. He was generous in a deliberate effort to change the behavior of others for the benefit of all. Although Marty didn’t know Grant’s work, when I described it to him, he said, “You got it.” There are lots more givers on the superteams. Doug Lorch distributed programming tools, which got others thinking about creating and sharing their own.

Hold the excitement.

But let’s not take this too far. A busy executive might think “I want some of those” and imagine the recipe is straightforward: shop for top performers, marinate them in collaborative teams, strain out the groupthink, sprinkle in some givers, and wait for the smart decisions and money to start flowing. Sadly, it isn’t that simple. Replicating this in an existing organization with real employees would be a challenge. Singling out people for “super” status may be divisive and transferring people into cross-functional teams can be disruptive. And there’s no guarantee of results. There were eccentric exceptions to the tendencies outlined above, such as the few teams who were not mutually supportive but who nonetheless did well. One of the best superforecasters even refused to leave comments for his teammates, saying he didn’t want to risk groupthink.

This is the messy world of psychological research. Solid conclusions take time and this work, particularly on superteams, is in its infancy. There are many questions we have only begun to explore

Chapter 12 – What’s Next?

Collaboration and depolarization of debate are the way forward.

Whether superforecasters can outpredict Friedman is both unknown and, for present purposes, beside the point. Superforecasters and superquestioners need to acknowledge each other’s complementary strengths, not dwell on each other’s alleged weaknesses. Friedman poses provocative questions that superforecasters should use to sharpen their foresight; superforecasters generate well-calibrated answers that superquestioners should use to fine-tune and occasionally overhaul their mental models of reality. The “Tom versus Bill” frame with which we started the book is our final false dichotomy. We need a Tom-Bill symbiosis. That’s a tall order. But there’s a much bigger collaboration I’d like to see. It would be the Holy Grail of my research program: using forecasting tournaments to depolarize unnecessarily polarized policy debates and make us collectively smarter.

[…]

There were attempts to extract lessons from events during those years, but they mostly involved brute force. Hammering opponents both for their forecasting failures and for not acknowledging them was a standard theme in the columns of Paul Krugman, whose Nobel Prize in economics and New York Times bully pulpit made him the most prominent Keynesian. Krugman’s opponents hammered back. Niall Ferguson wrote a three-part catalog of Krugman’s alleged failures. Back and forth it went, with each side poring over the other’s forecasts, looking for failures, deflecting attacks, and leveling accusations. For fans of one side or the other, it may have been thrilling. For those who hope that we can become collectively wiser, it was a bewildering fracas that looked less like a debate between great minds and more like a food fight between rival fraternities. These are accomplished people debating pressing issues, but nobody seems to have learned anything beyond how to defend their original position.

We can do better. Remember the “adversarial collaboration” between Daniel Kahneman and Gary Klein? These two psychologists won acclaim by developing apparently contradictory schools of thought, making each man a threat to the legacy of the other. But they were committed to playing by scientific ground rules, so they got together to discuss why they had such different views and how they could be reconciled. Something similar could, in principle, be done in forecasting.

Extremizing

Extremizing basically means scaling probability estimates up to 1 and down to 0, because individual forecasts bias down/up at those points but the mean need not…

From the Edge masterclass with Philip Tetlock

From the book:

That’s the thinking behind the extremizing algorithm I mentioned in chapter 4. It works superbly, but its effectiveness depends on diversity. A team with zero diversity—its members are clones and everyone knows everything that everyone else knows—should not be extremized at all. Of course no team matches that description. But some teams are good at sharing information and that reduces diversity somewhat. Superforecaster teams were like that, which is why extremizing didn’t help them much. But regular forecasting teams weren’t as good at sharing information. As a result, we got major gains when we extremized them. Indeed, extremizing gave regular forecaster teams a big enough boost to pass some superteams, and extremizing a large pool of regular forecasters produced, as we saw earlier, tournament-winning results.

Tetlock et al. published a paper on this shortly before the publication of Superforecasting.

When aggregating the probability estimates of many individuals to form a consensus probability estimate of an uncertain future event, it is common to combine them using a simple weighted average. Such aggregated probabilities correspond more closely to the real world if they are transformed by pushing them closer to 0 or 1. We explain the need for such transformations in terms of two distorting factors: The first factor is the compression of the probability scale at the two ends, so that random error tends to push the average probability toward 0.5. This effect does not occur for the median forecast, or, arguably, for the mean of the log odds of individual forecasts. The second factor—which affects mean, median, and mean of log odds—is the result of forecasters taking into account their individual ignorance of the total body of information available. Individual confidence in the direction of a probability judgment (high/low) thus fails to take into account the wisdom of crowds that results from combining different evidence available to different judges. We show that the same transformation function can approximately eliminate both distorting effects with different parameters for the mean and the median. And we show how, in principle, use of the median can help distinguish the two effects.

Appendix – Ten Commandments for Aspiring Superforecasters

(1) Triage.

Focus on questions where your hard work is likely to pay off. Don’t waste time either on easy “clocklike” questions (where simple rules of thumb can get you close to the right answer) or on impenetrable “cloud-like” questions (where even fancy statistical models can’t beat the dart-throwing chimp). Concentrate on questions in the Goldilocks zone of difficulty, where effort pays off the most.

For instance, “Who will win the presidential election, twelve years out, in 2028?” is impossible to forecast now. Don’t even try. Could you have predicted in 1940 the winner of the election, twelve years out, in 1952? If you think you could have known it would be a then-unknown colonel in the United States Army, Dwight Eisenhower, you may be afflicted by one of the worst cases of hindsight bias ever documented by psychologists.

Of course, triage judgment calls get harder as we come closer to home. How much justifiable confidence can we place in March 2015 on who will win the 2016 election? The short answer is not a lot but still a lot more than we can for the election in 2028. We can at least narrow the 2016 field to a small set of plausible contenders, which is a lot better than the vast set of unknown (Eisenhower-ish) possibilities lurking in 2028.

Certain classes of outcomes have well-deserved reputations for being radically unpredictable (e.g., oil prices, currency markets). But we usually don’t discover how unpredictable outcomes are until we have spun our wheels for a while trying to gain analytical traction. Bear in mind the two basic errors it is possible to make here. We could fail to try to predict the potentially predictable or we could waste our time trying to predict the unpredictable. Which error would be worse in the situation you face?

(2) Break seemingly intractable problems into tractable sub-problems.

Channel the playful but disciplined spirit of Enrico Fermi who—when he wasn’t designing the world’s first atomic reactor—loved ballparking answers to headscratchers such as “How many extraterrestrial civilizations exist in the universe?” Decompose the problem into its knowable and unknowable parts. Flush ignorance into the open. Expose and examine your assumptions. Dare to be wrong by making your best guesses. Better to discover errors quickly than to hide them behind vague verbiage.

Superforecasters see Fermi-izing as part of the job. How else could they generate quantitative answers to seemingly impossible-to-quantify questions about Arafat’s autopsy, bird-flu epidemics, oil prices, Boko Haram, the Battle of Aleppo, and bond-yield spreads.

We find this Fermi-izing spirit at work even in the quest for love, the ultimate unquantifiable. Consider Peter Backus, a lonely guy in London, who guesstimated the number of potential female partners in his vicinity by starting with the population of London (approximately six million) and winnowing that number down by the proportion of women in the population (about 50%), by the proportion of singles (about 50%), by the proportion in the right age range (about 20%), by the proportion of university graduates (about 26%), by the proportion he finds attractive (only 5%), by the proportion likely to find him attractive (only 5%), and by the proportion likely to be compatible with him (about 10%). Conclusion: roughly twenty-six women in the pool, a daunting but not impossible search task.

There are no objectively correct answers to true-love questions, but we can score the accuracy of the Fermi estimates that superforecasters generate in the IARPA tournament. The surprise is how often remarkably good probability estimates arise from a remarkably crude series of assumptions and guesstimates.

(3) Strike the right balance between inside and outside views.

Superforecasters know that there is nothing new under the sun. Nothing is 100% “unique.” Linguists be damned: uniqueness is a matter of degree. So superforecasters conduct creative searches for comparison classes even for seemingly unique events, such as the outcome of a hunt for a high-profile terrorist (Joseph Kony) or the standoff between a new socialist government in Athens and Greece’s creditors. Superforecasters are in the habit of posing the outside-view question: How often do things of this sort happen in situations of this sort?

So too apparently is Larry Summers, a Harvard professor and former Treasury secretary. He knows about the planning fallacy: when bosses ask employees how long it will take to finish a project, employees tend to underestimate the time they need, often by factors of two or three. Summers suspects his own employees are no different. One former employee, Greg Mankiw, himself now a famous economist, recalls Summers’s strategy: he doubled the employee’s estimate, then moved to the next higher time unit. “So, if the research assistant says the task will take an hour, it will take two days. If he says two days, it will take four weeks.” It’s a nerd joke: Summers corrected for employees’ failure to take the outside view in making estimates by taking the outside view toward employees’ estimates, and then inventing a funny correction factor.

Of course Summers would adjust his correction factor if an employee astonished him and delivered on time. He would balance his outside-view expectation of tardiness against the new inside-view evidence that a particular employee is an exception to the rule. Because each of us is, to some degree, unique.

(4) Strike the right balance between under- and overreacting to evidence.

Belief updating is to good forecasting as brushing and flossing are to good dental hygiene. It can be boring, occasionally uncomfortable, but it pays off in the long term. That said, don’t suppose that belief updating is always easy because it sometimes is. Skillful updating requires teasing subtle signals from noisy news flows—all the while resisting the lure of wishful thinking.

Savvy forecasters learn to ferret out telltale clues before the rest of us. They snoop for nonobvious lead indicators, about what would have to happen before X could, where X might be anything from an expansion of Arctic sea ice to a nuclear war in the Korean peninsula. Note the fine line here between picking up subtle clues before everyone else and getting suckered by misleading clues. Does the appearance of an article critical of North Korea in the official Chinese press signal that China is about to squeeze Pyongyang hard—or was it just a quirky error in editorial judgment? The best forecasters tend to be incremental belief updaters, often moving from probabilities of, say, 0.4 to 0.35 or from 0.6 to 0.65, distinctions too subtle to capture with vague verbiage, like “might” or “maybe,” but distinctions that, in the long run, define the difference between good and great forecasters.

Yet superforecasters also know how to jump, to move their probability estimates fast in response to diagnostic signals. Superforecasters are not perfect Bayesian updaters but they are better than most of us. And that is largely because they value this skill and work hard at cultivating it.

(5) Look for the clashing causal forces at work in each problem.

For every good policy argument, there is typically a counterargument that is at least worth acknowledging. For instance, if you are a devout dove who believes that threatening military action never brings peace, be open to the possibility that you might be wrong about Iran. And the same advice applies if you are a devout hawk who believes that soft “appeasement” policies never pay off. Each side should list, in advance, the signs that would nudge them toward the other.

Now here comes the really hard part. In classical dialectics, thesis meets antithesis, producing synthesis. In dragonfly eye, one view meets another and another and another—all of which must be synthesized into a single image. There are no paint-by-number rules here. Synthesis is an art that requires reconciling irreducibly subjective judgments. If you do it well, engaging in this process of synthesizing should transform you from a cookie-cutter dove or hawk into an odd hybrid creature, a dove-hawk, with a nuanced view of when tougher or softer policies are likelier to work.

(6) Strive to distinguish as many degrees of doubt as the problem permits but no more.

Few things are either certain or impossible. And “maybe” isn’t all that informative. So your uncertainty dial needs more than three settings. Nuance matters. The more degrees of uncertainty you can distinguish, the better a forecaster you are likely to be. As in poker, you have an advantage if you are better than your competitors at separating 60/40 bets from 40/60—or 55/45 from 45/55. Translating vagueverbiage hunches into numeric probabilities feels unnatural at first but it can be done. It just requires patience and practice. The superforecasters have shown what is possible.

Most of us could learn, quite quickly, to think in more granular ways about uncertainty. Recall the episode in which President Obama was trying to figure out whether Osama bin Laden was the mystery occupant of the walled-in compound in Abbottabad. And recall the probability estimates of his intelligence officers and the president’s reaction to their estimates: “This is fifty-fifty … a flip of the coin.” Now suppose that President Obama had been shooting the breeze with basketball buddies and each one offered probability estimates on the outcome of a college game—and those estimates corresponded exactly to those offered by intelligence officers on the whereabouts of Osama bin Laden. Would the president still have shrugged and said, “This is fifty-fifty,” or would he have said, “Sounds like the odds fall between three to one and four to one”? I bet on the latter. The president is accustomed to granular thinking in the domain of sports. Every year, he enjoys trying to predict the winners of the March Madness basketball tournament, a probability puzzle that draws the attention of serious statisticians. But, like his Democratic and Republican predecessors, he does not apply the same rigor to national security decisions. Why? Because different norms govern different thought processes. Reducing complex hunches to scorable probabilities is de rigueur in sports but not in national security.

So, don’t reserve rigorous reasoning for trivial pursuits. George Tenet would not have dared utter “slam dunk” about weapons of mass destruction in Iraq if the Bush 43 White House had enforced standards of evidence and proof that are second nature to seasoned gamblers on sporting events. Slam dunk implies one is willing to offer infinite odds—and to lose everything if one is wrong.

(7) Strike the right balance between under- and overconfidence, between prudence and decisiveness.

Superforecasters understand the risks both of rushing to judgment and of dawdling too long near “maybe.” They routinely manage the trade-off between the need to take decisive stands (who wants to listen to a waffler?) and the need to qualify their stands (who wants to listen to a blowhard?). They realize that long-term accuracy requires getting good scores on both calibration and resolution—which requires moving beyond blame-game ping-pong. It is not enough just to avoid the most recent mistake. They have to find creative ways to tamp down both types of forecasting errors—misses and false alarms—to the degree a fickle world permits such uncontroversial improvements in accuracy.

(8) Look for the errors behind your mistakes but beware of rearview-mirror hindsight biases.

Don’t try to justify or excuse your failures. Own them! Conduct unflinching postmortems: Where exactly did I go wrong? And remember that although the more common error is to learn too little from failure and to overlook flaws in your basic assumptions, it is also possible to learn too much (you may have been basically on the right track but made a minor technical mistake that had big ramifications). Also don’t forget to do postmortems on your successes too. Not all successes imply that your reasoning was right. You may have just lucked out by making offsetting errors. And if you keep confidently reasoning along the same lines, you are setting yourself up for a nasty surprise.

(9) Bring out the best in others and let others bring out the best in you.

Master the fine arts of team management, especially perspective taking (understanding the arguments of the other side so well that you can reproduce them to the other’s satisfaction), precision questioning (helping others to clarify their arguments so they are not misunderstood), and constructive confrontation (learning to disagree without being disagreeable). Wise leaders know how fine the line can be between a helpful suggestion and micromanagerial meddling or between a rigid group and a decisive one or between a scatterbrained group and an open-minded one. Tommy Lasorda, the former coach of the Los Angeles Dodgers, got it roughly right: “Managing is like holding a dove in your hand. If you hold it too tightly you kill it, but if you hold it too loosely, you lose it.”

(10) Master the error-balancing bicycle.

Implementing each commandment requires balancing opposing errors. Just as you can’t learn to ride a bicycle by reading a physics textbook, you can’t become a superforecaster by reading training manuals. Learning requires doing, with good feedback that leaves no ambiguity about whether you are succeeding—“I’m rolling along smoothly!”—or whether you are failing—“crash!” Also remember that practice is not just going through the motions of making forecasts, or casually reading the news and tossing out probabilities. Like all other known forms of expertise, superforecasting is the product of deep, deliberative practice.

(11) Don’t treat commandments as commandments.

“It is impossible to lay down binding rules,” Helmuth von Moltke warned, “because two cases will never be exactly the same.” As in war, so in all things. Guidelines are the best we can do in a world where nothing is certain or exactly repeatable. Superforecasting requires constant mindfulness, even when—perhaps especially when—you are dutifully trying to follow these commandments.

LONG INTERVIEW / ROUNDTABLE WITH TETLOCK ON EDGE.ORG

For tournaments to have a positive effect on society, we need to make a very concerted effort to improve the quality of the question generation process and to engage people in public debates to participate in that. The problem here is, and this is where I tend to come a little closer to Danny’s pessimism on this, it’s hard to convince someone who’s a high status incumbent to play in a game in which the best plausible outcome is you’re going to break even. Your fans already expect you to win, so if you win you’re basically breaking even. The more likely outcome is you’re not going to do all that well because there is a somewhat loose coupling and many pundits’ forecasting expertise probably is overrated.

Fun stuff:

The reason there’s not a big market for foxy case studies in business schools is because MBAs would probably recoil from them and business schools are pretty customer-friendly.

Slides from the Edge Masterclass with Philip Tetlock

Note: GJP refers to the Good Judgment Project, the research study on forecasting that Tetlock discusses through the book. The slides below summarise the findings from GJP and the participants’ success in IARPA’s forecasting tournaments.

Photo by Nicole Wilcox on Unsplash

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s