By Naomi Oreskes, Michael Oppenheimer, Dale Jamieson
Recently, the U.K. Met Office announced a revision to the Hadley Center historical analysis of sea surface temperatures (SST), suggesting that the oceans have warmed about 0.1 degree Celsius more than previously thought. The need for revision arises from the long-recognized problem that in the past sea surface temperatures were measured using a variety of error-prone methods such as using open buckets, lamb’s wool–wrapped thermometers, and canvas bags. It was not until the 1990s that oceanographers developed a network of consistent and reliable measurement buoys.
Then, to develop a consistent picture of long-term trends, techniques had to be developed to compensate for the errors in the older measurements and reconcile them with the newer ones. The Hadley Centre has led this effort, and the new data set—dubbed HadSST4—is a welcome advance in our understanding of global climate change.
But that’s where the good news ends. Because the oceans cover three fifths of the globe, this correction implies that previous estimates of overall global warming have been too low. Moreover it was reported recently that in the one place where it was carefully measured, the underwater melting that is driving disintegration of ice sheets and glaciers is occurring far faster than predicted by theory—as much as two orders of magnitude faster—throwing current model projections of sea level rise further in doubt.
These recent updates, suggesting that climate change and its impacts are emerging faster than scientists previously thought, are consistent with observations that we and other colleagues have made identifying a pattern in assessments of climate research of underestimation of certain key climate indicators, and therefore underestimation of the threat of climate disruption. When new observations of the climate system have provided more or better data, or permitted us to reevaluate old ones, the findings for ice extent, sea level rise and ocean temperature have generally been worse than earlier prevailing views.
Consistent underestimation is a form of bias—in the literal meaning of a systematic tendency to lean in one direction or another—which raises the question: what is causing this bias in scientific analyses of the climate system?
The question is significant for two reasons. First, climate skeptics and deniers have often accused scientists of exaggerating the threat of climate change, but the evidence shows that not only have they not exaggerated, they have underestimated. This is important for the interpretation of the scientific evidence, for the defense of the integrity of climate science, and for public comprehension of the urgency of the climate issue. Second, objectivity is an essential ideal in scientific work, so if we have evidence that findings are biased in any direction—towards alarmism or complacency—this should concern us We should seek to identify the sources of that bias and correct them if we can.
In our new book, Discerning Experts, we explored the workings of scientific assessments for policy, with particular attention to their internal dynamics, as we attempted to illuminate how the scientists working in assessments make the judgments they do. Among other things, we wanted to know how scientists respond to the pressures—sometimes subtle, sometimes overt—that arise when they know that their conclusions will be disseminated beyond the research community—in short, when they know that the world is watching. The view that scientific evidence should guide public policy presumes that the evidence is of high quality, and that scientists’ interpretations of it are broadly correct. But, until now, those assumptions have rarely been closely examined.
We found little reason to doubt the results of scientific assessments, overall. We found no evidence of fraud, malfeasance or deliberate deception or manipulation. Nor did we find any reason to doubt that scientific assessments accurately reflect the views of their expert communities. But we did find that scientists tend to underestimate the severity of threats and the rapidity with which they might unfold.
Among the factors that appear to contribute to underestimation is the perceived need for consensus, or what we label univocality: the felt need to speak in a single voice. Many scientists worry that if disagreement is publicly aired, government officials will conflate differences of opinion with ignorance and use this as justification for inaction. Others worry that even if policy makers want to act, they will find it difficult to do so if scientists fail to send an unambiguous message. Therefore, they will actively seek to find their common ground and focus on areas of agreement; in some cases, they will only put forward conclusions on which they can all agree.
How does this lead to underestimation? Consider a case in which most scientists think that the correct answer to a question is in the range 1–10, but some believe that it could be as high as 100. In such a case, everyone will agree that it is at least 1–10, but not everyone will agree that it could be as high as 100. Therefore, the area of agreement is 1–10, and this is reported as the consensus view. Wherever there is a range of possible outcomes that includes a long, high-end tail of probability, the area of overlap will necessarily lie at or near the low end. Error bars can be (and generally are) used to express the range of possible outcomes, but it may be difficult to achieve consensus on the high end of the error estimate.
The push toward agreement may also be driven by a mental model that sees facts as matters about which all reasonable people should be able to agree versus differences of opinion or judgment that are potentially irresolvable. If the conclusions of an assessment report are not univocal, then (it may be thought that) they will be viewed as opinions rather than facts and dismissed not only by hostile critics but even by friendly forces. The drive toward consensus may therefore be an attempt to present the findings of the assessment as matters of fact rather than judgment.
The impulse toward univocality arose strongly in a debate over how to characterize the risk of disintegration of the West Antarctic Ice Sheet (WAIS) in the Fourth Assessment Report of the IPCC (AR4). Nearly all experts agreed there was such a risk as climate warmed, but some thought it was only very far in the future while others thought it might be more imminent. An additional complication was that some scientists felt that the available data were simply not sufficient to draw any defensible conclusion about the short-term risk, and so they made no estimate at all.
However, everyone concurred that, if WAIS did not disintegrate soon, it would likely disintegrate in the long run. Therefore, the area of agreement lay in the domain of the long run—the conclusion of a non-imminent risk—and so that is what was reported. The result was a minimalist conclusion, and we know now that the estimates that were offered were almost certainly too low.
This offers a significant point of contrast with academic science, where there is no particular pressure to achieve agreement by any particular deadline (except perhaps within a lab group, in order to be able to publish findings or meet a grant proposal deadline). Moreover, in academic life scientists garner attention and sometimes prestige by disagreeing with their colleagues, particularly if the latter are prominent. The reward structure of academic life leans toward criticism and dissent; the demands of assessment push toward agreement.
A second reason for underestimation involves an asymmetry in how scientists think about error and its effects on their reputations. Many scientists worry that if they over-estimate a threat, they will lose credibility, whereas if they under-estimate it, it will have little (if any) reputational impact. In climate science, this anxiety is reinforced by the drumbeat of climate denial, in which scientists are accused of being “alarmists” who “exaggerate the threat.” In this context, scientists may go the extra mile to disprove the stereotype by down-playing known risks and denying critics the opportunity to label them as alarmists.
Many scientists consider underestimates to be “conservative,” because they are conservative with respect to the question of when to sound an alarm or how loudly to sound it. The logic of this can be questioned, because underestimation is not conservative when viewed in terms of giving people adequate time to prepare. (Consider for example, an underestimate of an imminent hurricane, tornado, or earthquake.) In the AR4 WAIS debate, scientists underestimated the threat of rapid ice sheet disintegration because many of the scientists who participated were more comfortable with an estimate that they viewed as "conservative" than with one that was not.
The combination of these three factors—the push for univocality, the belief that conservatism is socially and politically protective, and the reluctance to make estimates at all when the available data are contradictory—can lead to “least common denominator'' results—minimalist conclusions that are weak or incomplete.
Moreover, if consensus is viewed as a requirement, scientists may avoid discussing tricky issues that engender controversy (but might still be important), or exclude certain experts whose opinions are known to be “controversial” (but may nevertheless have pertinent expertise). They may also consciously or unconsciously pull back from reporting on extreme outcomes. (Elsewhere we have labeled this tendency "erring on the side of least drama.”) In short, the push for agreement and caution may undermine other important goals, including inclusivity, accuracy and comprehension.
We are not suggesting that every example of underestimation is necessarily caused by the factors we observed in our work, nor that the demand for consensus always leads to conservatism. Without looking closely at any given case, we cannot be sure whether the effects we observed are operating or not. But we found that the pattern of underestimation that we observed in the WAIS debate also occurred in assessments of acid rain and the ozone hole.
We found that the institutional aspects of assessment, including who the authors are and how they are chosen, how the substance is divided into chapters, and guidance emphasizing consensus, also mitigate in favor of scientific conservatism. Thus, so far as our evidence goes, it appears that scientists working in assessments are more likely to underestimate than to overestimate threats.
In our book, we make some concrete recommendations. While scientists in assessments generally aim for consensus, we suggest that they should not view consensus as a goal of the assessment. Depending on the state of scientific knowledge, consensus may or may not emerge from an assessment, but it should not be viewed as something that needs to be achieved and certainly not as something to be enforced. Where there are substantive differences of opinion, they should be acknowledged and the reasons for them explained (to the extent that they can be explained). Scientific communities should also be open to experimenting with alternative models for making and expressing group judgments, and to learning more about how policy makers actually interpret the findings that result.