In this post I want to bring together two things that I seem to be talking a lot about, especially in the wake of our run of record high temperatures. They are

- What is the main component of the error that is quoted on global anomaly average for some period (month, year)? and
- Why use anomalies? (an old perennial, see also GISS, NOAA)

I'll use the

USHCN V2.5 dataset as a worked example, since I'm planning to write a bit more about some recent misuse of that. In particular I'll use the adjusted USHCN for 2011.

####
Using anomalies

I have been finding it necessary to go over some essentials of using anomalies. The basic arithmetic is

- Compute some "normal" (usually a 30-year period time average for each month) for each station in the network,
- Form local anomalies by subtracting the relevant normal from each reading
- Average the anomalies (usually area-weighted)

People tend to think that you get the aanomaly average just by averaging, then subtracting an offset. That is quite wrong; they must be formed before averaging. Afterward you can shift to a different anomaly base by offsetting the mean.

####
Coverage error - spatial sampling error for the mean.

Indices like GISS and HADCRUT usually quote a monthly or annual mean with an uncertainty of up to 0.1°C. In recent years contrarians have seized on this to say that maybe it isn't a record at all - a "statistical tie" is a pet phrase, for those whose head hurts thinking about statistics. But what very few people understand is what that uncertainty means. I'll quote here from something I wrote at WUWT:

The way to think about stated uncertainties is that they represent the range of results that could have been obtained if things had been done differently. And so the question is, which "things". This concept is made explicit in the HADCRUT ensemble approach, where they do 100 repeated runs, looking at each stage in which an estimated number is used, and choosing other estimates from a distribution. Then the actual spread of results gives the uncertainty. Brohan et al 2006 lists some of the things that are varied.

The underlying concept is sampling error. Suppose you conduct a poll, asking 1000 people if they will vote for A or B. You find 52% for A. The uncertainty comes from, what if you had asked different people? For temperature, I'll list three sources of error important in various ways:

1. Measurement error. This is what many people think uncertainties refer to, but it usually isn't. measurement errors become insignificant because of the huge number of data that is averaged. measurement error estimates what could happen if you had used different observers or instruments to make the same observation, same time, same place.

2. Location uncertainty. Ths is dominant for global annual and monthly averages.You measured in sampled locations - what if the sample changed? You measured in different places around the earth? Same time, different places.

3. Trend uncertainty, what we are talking about above. You get trend from a statistical model, in which the residuals are assumed to come from a random distribution, representing unpredictable aspects (weather). The trend uncertainty is calculated on the basis of, what if you sampled differently from that distribution? Had different weather? This is important for deciding if your trend is something that might happen again in the future. If it is a rare event, maybe. But it is not a test of whether it really happened. We know how the weather turned out.
So here I'm talking about location uncertainty. What if you had sampled in different places. And in this exercise I'll do just that. I'll choose subsets of 500 of the USHCN and see what answers we see. That is why USHCN is chosen - there is surplus information from the dense coverage.

####
Why use anomaly?

We'll see. What I want to show is that it dramatically reduces location sampling error. The reason is that the anomaly set is much more homogeneous, since the expected value everywhere is more or less zero. So there is less variation in switching stations in and out. So I'll measure the error with and without anomaly formation.

####
USHCN example

So I'll look at the data for the 1218 stations in 2010, with an anomaly relative to the 1981-2010 average. In a Monte Carlo style, I make 1000 choices of 500 random stations, and find the average for 2011, first by just averaging station temperatures, and then the anomalies. The results (in °C) are:

Base 1981-2010, unweighted .. | Mean of means .. | s.d. of means |

Temperatures | 11.863 | 0.201 |

Anomalies | 0.191 | 0.025 |

So the spatial error is reduced by a factor of 8, to an acceptable value. The error of temperature alone, at 0.201, was quite unacceptable. But anomalies perform even better with area-weighting, which should always be used. Here I calculate state averages and then area-weight the states (as USHCN used to do):

Update: I had implemented the area-weighting incorrectly when I posted about an hour ago. Now I think it is right, and the sd's are further reduced, although now the absolute improves by slightly more than the anomalies.
Base 1981-2010, area-weighted .. | Mean of means .. | s.d. of means |

Temperatures | 12.102 | 0.137 |

Anomalies | 0.101 | 0.016 |

For both absolute T and anomalies, the mean has gone up, but the SD has reduced. In fact T improves by a slightly greater factor, but is still rather too high. The anomaly sd is now very good.

Does the anomaly base matter? A little, which is why WMO recommends the latest 3 decade period. I'll repeat the last table with the 1951-90 base:

Base 1951-80, area-weighted .. | Mean of means .. | s.d. of means |

Temperatures | 12.103 | 0.138 |

Anomalies | 0.620 | 0.021 |

The T average is little changed, as expected. The small change reflects the fact that sampling 1000 makes the results almost independent of that random choice. But the anomaly mean is higher, reflecting warming. And the sd is a little higher, showing that subtracting a slightly worse estimate of the 2011 value (the older base) makes a less homogeneous set.

####
So what to make of spatial sampling error?

It is significant (with 500 station subsets) for anomaly, and the reason why large datasets are sought. In terms of record hot years, I think there is a case for omitting it. It is the error if between 2015 and 2016 the set of stations had been changed, and that happened only to a very small extent. I don't think the

*theoretical* possibility of juggling the station set between years is an appropriate consideration for such a record.

####
Conclusion

Spatial sampling, or coverage error for anomalies is significant for ConUS. Reducing this error is why a lot of stations are used. It would be an order of magnitude greater without the use of anomalies, because of the much greater inhomogeneity, which is why one should never average raw temperatures spatially.