Bias Is a Big Problem. But So Is ‘Noise.’

Useful discussion in the context of human and AI decision-making. AI provides greater consistency (less noise or variability than humans) but with the risk of bias being part of the algorithms, and the importance of distinguishing the two when assessing decision-making:

The word “bias” commonly appears in conversations about mistaken judgments and unfortunate decisions. We use it when there is discrimination, for instance against women or in favor of Ivy League graduates. But the meaning of the word is broader: A bias is any predictable error that inclines your judgment in a particular direction. For instance, we speak of bias when forecasts of sales are consistently optimistic or investment decisions overly cautious.

Society has devoted a lot of attention to the problem of bias — and rightly so. But when it comes to mistaken judgments and unfortunate decisions, there is another type of error that attracts far less attention: noise.

To see the difference between bias and noise, consider your bathroom scale. If on average the readings it gives are too high (or too low), the scale is biased. If it shows different readings when you step on it several times in quick succession, the scale is noisy. (Cheap scales are likely to be both biased and noisy.) While bias is the average of errors, noise is their variability.

Although it is often ignored, noise is a large source of malfunction in society. In a 1981 study, for example, 208 federal judges were asked to determine the appropriate sentences for the same 16 cases. The cases were described by the characteristics of the offense (robbery or fraud, violent or not) and of the defendant (young or old, repeat or first-time offender, accomplice or principal). You might have expected judges to agree closely about such vignettes, which were stripped of distracting details and contained only relevant information.

But the judges did not agree. The average difference between the sentences that two randomly chosen judges gave for the same crime was more than 3.5 years. Considering that the mean sentence was seven years, that was a disconcerting amount of noise.

Noise in real courtrooms is surely only worse, as actual cases are more complex and difficult to judge than stylized vignettes. It is hard to escape the conclusion that sentencing is in part a lottery, because the punishment can vary by many years depending on which judge is assigned to the case and on the judge’s state of mind on that day. The judicial system is unacceptably noisy.

Consider another noisy system, this time in the private sector. In 2015, we conducted a study of underwriters in a large insurance company. Forty-eight underwriters were shown realistic summaries of risks to which they assigned premiums, just as they did in their jobs.

How much of a difference would you expect to find between the premium values that two competent underwriters assigned to the same risk? Executives in the insurance company said they expected about a 10 percent difference. But the typical difference we found between two underwriters was an astonishing 55 percent of their average premium — more than five times as large as the executives had expected.

Many other studies demonstrate noise in professional judgments. Radiologists disagree on their readings of images and cardiologists on their surgery decisions. Forecasts of economic outcomes are notoriously noisy. Sometimes fingerprint experts disagree about whether there is a “match.” Wherever there is judgment, there is noise — and more of it than you think.

Noise causes error, as does bias, but the two kinds of error are separate and independent. A company’s hiring decisions could be unbiased overall if some of its recruiters favor men and others favor women. However, its hiring decisions would be noisy, and the company would make many bad choices. Likewise, if one insurance policy is overpriced and another is underpriced by the same amount, the company is making two mistakes, even though there is no overall bias.

Where does noise come from? There is much evidence that irrelevant circumstances can affect judgments. In the case of criminal sentencing, for instance, a judge’s mood, fatigue and even the weather can all have modest but detectable effects on judicial decisions.

Another source of noise is that people can have different general tendencies. Judges often vary in the severity of the sentences they mete out: There are “hanging” judges and lenient ones.

A third source of noise is less intuitive, although it is usually the largest: People can have not only different general tendencies (say, whether they are harsh or lenient) but also different patterns of assessment (say, which types of cases they believe merit being harsh or lenient about). Underwriters differ in their views of what is risky, and doctors in their views of which ailments require treatment. We celebrate the uniqueness of individuals, but we tend to forget that, when we expect consistency, uniqueness becomes a liability.

Once you become aware of noise, you can look for ways to reduce it. For instance, independent judgments from a number of people can be averaged (a frequent practice in forecasting). Guidelines, such as those often used in medicine, can help professionals reach better and more uniform decisions. As studies of hiring practices have consistently shown, imposing structure and discipline in interviews and other forms of assessment tends to improve judgments of job candidates.

No noise-reduction techniques will be deployed, however, if we do not first recognize the existence of noise. Noise is too often neglected. But it is a serious issue that results in frequent error and rampant injustice. Organizations and institutions, public and private, will make better decisions if they take noise seriously.

Daniel Kahneman is an emeritus professor of psychology at Princeton and a recipient of the 2002 Nobel Memorial Prize in Economic Sciences. Olivier Sibony is a professor of strategy at the HEC Paris business school. Cass R. Sunstein is a law professor at Harvard. They are the authors of the forthcoming book “Noise: A Flaw in Human Judgment,” on which this essay is based.


You’re Not Going to Change Your Mind – The New York Times

As someone who is always interested in how we think and process information, and a great fan of Kahneman, Thaler and Sunstein’s work on behavioural economics and nudges, found this article on the desirability bias of interest:

But what if confirmation bias isn’t the only culprit? It recently struck us that confirmation bias is often conflated with “telling people what they want to hear,” which is actually a distinct phenomenon known as desirability bias, or the tendency to credit information you want to believe. Though there is a clear difference between what you believe and what you want to believe — a pessimist may expect the worst but hope for the best — when it comes to political beliefs, they are frequently aligned.

For example, gun-control advocates who believe stricter firearms laws will reduce gun-related homicides usually also want to believe that such laws will reduce gun-related homicides. If those advocates decline to revise their beliefs in the face of evidence to the contrary, it can be hard to tell which bias is at work.

So we decided to conduct an experiment that would isolate these biases. This way, we could see whether a reluctance to revise political beliefs was a result of confirmation bias or desirability bias (or both). Our experiment capitalized on the fact that one month before the 2016 presidential election there was a profusion of close polling results concerning Donald Trump and Hillary Clinton.

We asked 900 United States residents which candidate they wanted to win the election, and which candidate they believed was most likely to win. Respondents fell into two groups. In one group were those who believed the candidate they wanted to win was also most likely to win (for example, the Clinton supporter who believed Mrs. Clinton would win). In the other group were those who believed the candidate they wanted to win was not the candidate most likely to win (for example, the Trump supporter who believed Mrs. Clinton would win). Each person in the study then read about recent polling results emphasizing either that Mrs. Clinton or Mr. Trump was more likely to win.

Roughly half of our participants believed their preferred candidate was the one less likely to win the election. For those people, the desirability of the polling evidence was decoupled from its value in confirming their beliefs.

After reading about the recent polling numbers, all the participants once again indicated which candidate they believed was most likely to win. The results, which we report in a forthcoming paper in the Journal of Experimental Psychology: General, were clear and robust. Those people who received desirable evidence — polls suggesting that their preferred candidate was going to win — took note and incorporated the information into their subsequent belief about which candidate was most likely to win the election. In contrast, those people who received undesirable evidence barely changed their belief about which candidate was most likely to win.

Importantly, this bias in favor of the desirable evidence emerged irrespective of whether the polls confirmed or disconfirmed peoples’ prior belief about which candidate would win. In other words, we observed a general bias toward the desirable evidence.

What about confirmation bias? To our surprise, those people who received confirming evidence — polls supporting their prior belief about which candidate was most likely to win — showed no bias in favor of this information. They tended to incorporate this evidence into their subsequent belief to the same extent as those people who had their prior belief disconfirmed. In other words, we observed little to no bias toward the confirming evidence.

We also explored which supporters showed the greatest bias in favor of the desirable evidence. The results were bipartisan: Supporters of Mr. Trump and supporters of Mrs. Clinton showed a similar-size bias in favor of the desirable evidence.

Our study suggests that political belief polarization may emerge because of peoples’ conflicting desires, not their conflicting beliefs per se. This is rather troubling, as it implies that even if we were to escape from our political echo chambers, it wouldn’t help much. Short of changing what people want to believe, we must find other ways to unify our perceptions of reality.

ICYMI: The Choice Explosion – The New York Times

Interesting insights on decision-making in the book, Decisive, by  Chip and Dan Heath:

It’s becoming incredibly important to learn to decide well, to develop the techniques of self-distancing to counteract the flaws in our own mental machinery. The Heath book is a very good compilation of those techniques.

For example, they mention the maxim, assume positive intent. When in the midst of some conflict, start with the belief that others are well-intentioned. It makes it easier to absorb information from people you’d rather not listen to.

They highlight Suzy Welch’s 10-10-10 rule. When you’re about to make a decision, ask yourself how you will feel about it 10 minutes from now, 10 months from now and 10 years from now. People are overly biased by the immediate pain of some choice, but they can put the short-term pain in long-term perspective by asking these questions.

The Heaths recommend making deliberate mistakes. A survey of new brides found that 20 percent were not initially attracted to the man they ended up marrying. Sometimes it’s useful to make a deliberate “mistake” — agreeing to dinner with a guy who is not your normal type. Sometimes you don’t really know what you want and the filters you apply are hurting you.

They mention our tendency to narrow-frame, to see every decision as a binary “whether or not” alternative. Whenever you find yourself asking “whether or not,” it’s best to step back and ask, “How can I widen my options?” In other words, before you ask, “Should I fire this person?” Ask, “Is there any way I can shift this employee’s role to take advantage of his strengths and avoid his weaknesses?”

The explosion of choice means we all need more help understanding the anatomy of decision-making. It makes you think that we should have explicit decision-making curriculums in all schools. Maybe there should be a common course publicizing the work of Daniel Kahneman, Cass Sunstein, Dan Ariely and others who study the way we mess up and the techniques we can adopt to prevent error.

Source: The Choice Explosion – The New York Times

Public servants flock to PCO’s first-ever behavioural economics briefing

I am a fan of nudges and Kirkman captures the reality that current politics already incorporate nudges, and so the question is more what kind of nudge is more effective as part of policy and program design, rather than more existential questioning.

As readers already know, I am also a fan of behavioural economics, and found the insights in Kahneman’s Thinking Fast and Slow particularly relevant to policy makers who may not be as aware of their thinking processes as needed:

Elspeth Kirkman, North American head of the Behavioural Insights Team’s head of North American operations, was asked during a presentation how she responds to criticism that she’s involved in “social engineering.” She said governments cannot get away from the fact they have to encourage certain kinds of behaviour from people, so it might as well be done effectively.

“Departments and governments are already nudging people in terms of how they present information to them, how they ask them to do things, how they structure their defaults, and all we’re doing really is being mindful about that,” she said. “We’re saying, actually, let’s just understand what the implication in the way that we’re structuring that choice is.”

Eldar Shafir, a professor of psychology and public affairs at Princeton University, told the audience that sometimes more than a “nudge” is needed when it comes to public policy.

“I’m a big fan of nudges … but nudges are a very modest attempt to interfere minimally, often at a very low cost, when you’re politically somewhat helpless, in ways that that help people,” he said.

“But there’s a lot more than that. And if you think about what policy does throughout, whether it’s the design of emergency rooms or what it takes to make a nation healthy and happy, there are profound psychological questions that lie at the core of what we do.”

When asked to elaborate in what qualifies as a nudge and what’s seen as more, Mr. Shafir noted how buildings are often designed—in terms of where stairs, elevators, and parking lots are placed—to promote physical activity, and he feels buildings that are constructed in ways to encourage certain behaviours represent the kind of policy that goes beyond nudges.

Ms. Kirkman talked about an EAST model—which stands for easy, attractive, social, and timely—for creating conditions for public compliance with government policies.

She talked about using plain language and less “legalese” to make it easier for people to understand government communications. She used an example of a U.S. city that had an unfortunate practice of sending out very technically worded letters to homeowners whose properties did not meet municipal standards.

“The letter actually starts with: ‘According to Chapter 156 and/or Chapter 155 and/or Chapter 37 in the [municipal] ordinances process, we have found your property to be in violation of inspection.’ And it kind of just goes on and on and on like this, and it doesn’t actually say, ‘Hey, you need to fix your property and here’s what’s wrong with it.’ ”

In terms of making things attractive, Ms. Kirkman used an example how different styles of texting unemployed people from a job centre in the Britain to inform them about a new supermarket that was holding a job fair. She said 10 per cent of the people notified would typically attend such a non-mandatory event. However, when people’s individual names were used in the message, that increased to 15 per cent. When the message appeared to come from the unemployed people’s employment advisers, it increased to 17 per cent. Finally, that rate increased to 26 per cent when the individuals were told their advisers had booked them a time-slot at this event.

The social aspect of encouraging certain actions is shown by Mr. Treusch’s example of publicizing how most people pay their taxes, Ms. Kirkman said.

Another factor is who conveys the message, she said. She recalled how the British government once sent letters signed by the chief medical officer that advised certain physicians to prescribe antibiotics less often, and the campaign was a success. She said the message would have been less effective with this particular audience if it came from the health minister. These physicians were also told how the majority of their peers were prescribing fewer antibiotics, she added.

An example of timeliness focused on a police force Britain that was found to be much less ethnically diverse than the community it serves. Research ultimately uncovered that most applicants of minority ethnicities were failing an online test in which they were asked how they would react, as a police officer, to certain situations.

Ms. Kirkman said it’s believed the effect of “stereotype threat” was at work, where people who are part of groups that have negative stereotypes tend to perform worse in certain instances if reminded of those stereotypes just before the task.

She said when the wording of the email asking applicants to take this test was changed to be “warmer” and contain a preamble asking them to think about what it would mean to their community if they became a police officer, the gap in success in the test between white applicants and others was closed.

Source: Public servants flock to PCO’s first-ever behavioural economics briefing |

How diversity actually makes us smarter

In other words, diversity requires us to shift from automatic to deliberative thinking (from System 1 to System 2 in Kahneman’s terms):

In more homogenous parishes, towns, states and countries, residents aren’t necessarily obliged to take that extra intellectual step. In places where the overwhelming majority of residents share a common background, they are more likely to maintain an unspoken consensus about the meaning of institutions and practices. That consensus, Dutch philosopher Bart van Leeuwen reminds us, is enforced “through sayings and jokes, in ways of speaking and moving, and in subtle facial expressions that betray surprise or recognition.” In other words, the way things are is so self-evident that they don’t require a second thought.

Diversity, however, requires second thoughts. When the consensus is challenged in a homogenous place by the presence of new people, things get interesting. The familiar signs and symbols that undergird our implicit understanding of the world can change in meaning. The presence of conflicting worldviews causes confusion, uncertainty, and alienation for holdovers and newcomers alike. These feelings can either cause people to draw back into themselves — or force them to articulate and justify themselves to those who don’t share their view of the world. Or both.Because of our long history of immigration, the disruptions of diversity have been commonplace in American life. The late historian Timothy L. Smith famously called migration to the U.S. a “theologizing experience” that forced newcomers into the existential dilemma of having to “determine how to act in these new circumstances by reference not simply to a dominant ‘host’ culture but to a dozen competing subcultures, all of which were in the process of adjustment.”

How diversity actually makes us smarter – The Washington Post.

Moral Judgments Depend on What Language We’re Speaking –

Interesting psychological experiment on language. Believe there have been similar experiments with managers working in a second-language which both slows down their thinking (Kahneman’s System 2) and removes some of the emotion:

But we’ve got some surprising news. In a study recently published in the journal PloS One, our two research teams, working independently, discovered that when people are presented with the trolley problem in a foreign language, they are more willing to sacrifice one person to save five than when they are presented with the dilemma in their native tongue.

One research team, working in Barcelona, recruited native Spanish speakers studying English and vice versa and randomly assigned them to read this dilemma in either English or Spanish. In their native tongue, only 18 percent said they would push the man, but in a foreign language, almost half 44 percent would do so. The other research team, working in Chicago, found similar results with languages as diverse as Korean, Hebrew, Japanese, English and Spanish. For more than 1,000 participants, moral choice was influenced by whether the language was native or foreign. In practice, our moral code might be much more pliable than we think.

Extreme moral dilemmas are supposed to touch the very core of our moral being. So why the inconsistency? The answer, we believe, is reminiscent of Nelson Mandela’s advice about negotiation: “If you talk to a man in a language he understands, that goes to his head. If you talk to him in his language, that goes to his heart.” As psychology researchers such as Catherine Caldwell-Harris have shown, in general people react less strongly to emotional expressions in a foreign language.

An aversion to pushing the large man onto the tracks seems to engage a deeply emotional part of us, whereas privileging five lives over one appears to result from a less emotional, more utilitarian calculus. Accordingly, when our participants faced this dilemma in their native tongue, they reacted more emotionally and spared the man. Whereas a foreign language seemed to provide participants with an emotional distance that resulted in the less visceral choice to save the five people.

If this explanation is correct, then you would expect that a less emotionally vivid version of the same dilemma would minimize the difference between being presented with it in a foreign versus a native language. And this indeed is what we found. We conducted the same experiment using a dilemma almost identical to the footbridge — but with one crucial difference. In this version, you can save the five people by diverting the trolley to a track where the large man is, rather than by actively shoving him off the bridge.

Moral Judgments Depend on What Language We’re Speaking –

Language and morality: Gained in translation

Interesting finding but not surprising that slowing down thinking, through the language barrier, can lead to more rational outcomes:

Several psychologists, including Daniel Kahneman, who was awarded the Nobel prize in economics in 2002 for his work on how people make decisions, think that the mind uses two separate cognitive systems—one for quick, intuitive decisions and another that makes slower, more reasoned choices. These can conflict, which is what the trolley dilemma is designed to provoke: normal people have a moral aversion to killing (the intuitive system), but can nonetheless recognise that one death is, mathematically speaking, better than five (the reasoning system).

This latest study fits with other research which suggests that speaking a foreign language boosts the second system—provided, that is, you don’t speak it as well as a native. Earlier work, by some of the same scholars who performed this new study, found that people tend to fare better on tests of pure logic in a foreign language—and particularly on questions with an obvious-but-wrong answer and a correct answer that takes time to work out.

Dr Costa and his colleagues hypothesise that, while fluent speakers can form sentences effortlessly, the merely competent must spend more brainpower, and reason much more carefully, when operating in their less-familiar tongue. And that kind of thinking helps to provide psychological and emotional distance, in much the same way that replacing the fat man with a switch does. As further support for that idea, the researchers note that the effect of speaking the foreign language became smaller as the speaker’s familiarity with it increased.

Language and morality: Gained in translation | The Economist.

Decision-Making: Refugee claim acceptance in Canada appears to be ‘luck of the draw’ despite reforms, analysis shows

Interesting from a decision-making perspective.

Reading this reminded me of some of Daniel Kahneman’s similar work where he showed considerable variability in decision-making, even depending on the time of day. A reminder of the difficulty of ensuring consistent decision-making, given that people are people, automatic thinking, reflecting our experiences and perceptions, is often as important as more deliberative thinking. No easy solutions but regular analysis of decisions and feedback may help:

There are legitimate reasons why decisions by some adjudicators lean in one direction, such as adjudicators specializing in claimants from a certain region. (Someone hearing cases from Syria will have a higher acceptance rate than someone hearing claims from France.) Some members hear more expedited cases, which are typically urgent claims with specific aggravating or mitigating facts.

“My view is that even when you try to control for those sorts of differences, a very large difference in acceptance rates still exists,” said Mr. Rehaag. “You get into the more idiosyncratic elements of individual identity.”

These may reflect the politics of the adjudicator or impressions about a country. If adjudicators have been on a relaxing holiday in a country they may be less likely to accept a claimant faces horrors there.

Refugee claim acceptance in Canada appears to be ‘luck of the draw’ despite reforms, analysis shows | National Post.

Daniel Kahneman Testimonials

For policy wonks and “nudge nerds”, a good collection of testimonials to the impact of Daniel Kahneman’s work, summarized in his best-selling book, Thinking Fast and Slow. One example from Richard Thaler and Sendhil Mullainathan:

Kahneman and Tversky’s work did not just attack rationality, it offered a constructive alternative: a better description of how humans think. People, they argued, often use simple rules of thumb to make judgments, which incidentally is a pretty smart thing to do. But this is not the insight that left us one step from doing behavioral economics. The breakthrough idea was that these rules of thumb could be catalogued. And once understood they can be used to predict where people will make systematic errors. Those two words are what made behavioral economics possible.

Consider their famous representativeness heuristic, the tendency to judge probabilities by similarity. Use of this heuristic can lead people to make forecasts that are too extreme, often based on sample sizes that are too small to offer reliable predictions. As a result, we can expect forecasters to be predictably surprised when they draw on small samples. When they are very optimistic, the outcomes will tend to be worse than they thought, and unduly pessimistic forecasts will lead to pleasant but unexpected surprises. To the great surprise to economists who had put great faith in the efficiency of markets, this simple idea led to the discovery of large mispricing in domains that vary from stock markets to the selection of players in the National Football League.