moyhu: Flutter in GHCN V3 adjusted temperatures.

Thursday, February 9, 2017

Flutter in GHCN V3 adjusted temperatures.

In the recent discussion of the kerfuffle of John Bates and the Karl 2015 paper, the claim of Bates that the GHCN adjustment algorithm was subject to instability arose. Bates claim seemed to be of an actual fault in the code. I explained why I think that is unlikely, but rather it is a feature of the Pairwise Homogenisation Algorithm (PHA).

GHCN V3 adjusted is issued approximately daily, although it is not clear how often the underlying algorithm is run. It is posted here - see the readme file and look for the qca label.

Paul Matthews linked to his analysis of variations in Alice Springs adjusted over time. It did look remarkable; fluctuations of a degree or more over quite short intervals, with maximum excursions of about 3°C. This was in about 2012. However Peter O'Neill had done a much more extensive study with many stations and more recent years (and using many more adjustment files). He found somewhat smaller variations, and of frequent but variable occurrence.

I don't have a succession of GHCN adjusted files available, but I do have the latest (downloaded 9 Feb) and I have one with a file date here of 21 June 2015. So I thought I would look at differences between these to try to get an overall picture of what is going on.

I restricted to data since 1880, in line with what most indices use. So the first thing I should show is a histogram of all the differences for all stations:

The mean is -0.004°C and the sd is 0.331°C. Here is a breakdown by months - the result is remarkably even

	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
Mean	-0.0039	-0.0039	-0.0038	-0.0041	-0.0042	-0.0041	-0.0039	-0.0038	-0.0039	-0.0039	-0.0039	-0.0041
sd	0.3312	0.3311	0.331	0.3306	0.3313	0.3312	0.331	0.3307	0.3304	0.3305	0.3306	0.3305

I next looked at the years since 1999 - 21st Century. Again the histogram was:

Now the mean was - 0.0017, and sd 0.221. And the breakdown by months was

	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
Mean	-0.0018	-0.0017	-0.0012	-0.0012	-0.0029	-0.0022	-0.0018	-0.0015	-0.0017	-0.0013	-0.0017	-0.0018
sd	0.2202	0.2194	0.2191	0.2184	0.2236	0.223	0.2228	0.2215	0.2201	0.2196	0.2197	0.2188

Analysis

The PHA is a trade-off. It seeks to reduce bias from non-climate events, which would not be reduced by the veraging process. The cost is a degree of uncertain and sometimes wrong identification, which appears as added noise. Now noise is heavily damped by the averaging, as long as it is unbiased. Ensuring that is part of the design of the algorithm, and can be tested on synthetic data.

Here there is quite substantial noise showing up as time discrepancies. I did a demonstration a while ago showing that adding white noise of even 1°C amplitude made virtually no difference to the average. So thinking of the global average, the sd of 0.33°C for the whole period is not necessarily alarming. And what is reassuring is that the mean is very close to zero, not only overall but for each month. This strongly suggests that the noise does not introduce bias.

I'd like to take this further with a regional breakdown, and rural/urban. But for the moment, I think is expands the picture of this flutter and what it means.

Appendix - a comment from Bob Koss, which I am posting here to get readable format

I noticed a couple people mentioned v4 USCRN data. They aren't in v3.

Here are a couple data tables giving data means and tallies. Adjusted - Raw calculations.

USCRN from v4.b.1.20170209.
Year  -Mean   -Mths   +Mean   +Mths   ±Mean ±Mths All_Mean All_Mths Stns
2001  0.000   0       0.000   0       0.000   0       0.000    11       2
2002  -0.210  5       0.225   18      0.130   23      0.023    129      17
2003  -0.215  24      0.237   48      0.086   72      0.018    339      39
2004  -0.292  29      0.233   97      0.113   126     0.022    638      67
2005  -0.328  37      0.256   133     0.129   170     0.025    861      79
2006  -0.322  26      0.265   155     0.181   181     0.033    982      92
2007  -0.210  11      0.294   170     0.263   181     0.040    1199     104
2008  -0.210  11      0.339   154     0.302   165     0.040    1237     106
2009  -0.210  12      0.358   145     0.315   157     0.039    1252     105
2010  -0.210  11      0.368   135     0.324   146     0.038    1239     106
2011  -0.210  12      0.367   106     0.309   118     0.029    1239     105
2012  -0.210  12      0.381   95      0.314   107     0.027    1267     106
2013  -0.210  11      0.379   48      0.269   59      0.013    1267     106
2014  0.000   0       0.449   11      0.449   11      0.004    1262     106
2015  0.000   0       0.000   0       0.000   0       0.000    1264     106
2016  0.000   0       0.000   0       0.000   0       0.000    1261     106
2017  0.000   0       0.000   0       0.000   0       0.000    105      105

GHCN from v3.3.0.20170201
Year  -Mean   -Mths   +Mean   +Mths   ±Mean ±Mths All_Mean All_Mths Stns
2001  -0.527  7774    0.456   7586    -0.041  15360   -0.022   28608    2752
2002  -0.523  7543    0.450   7517    -0.037  15060   -0.019   28993    2786
2003  -0.530  7388    0.446   7547    -0.037  14935   -0.018   29861    2778
2004  -0.523  7131    0.443   7279    -0.035  14410   -0.017   28963    2809
2005  -0.515  6869    0.446   6985    -0.031  13854   -0.015   28215    2677
2006  -0.511  6567    0.442   6968    -0.020  13535   -0.010   28238    2655
2007  -0.511  6333    0.436   6893    -0.017  13226   -0.008   28720    2640
2008  -0.507  6156    0.419   6786    -0.022  12942   -0.010   29013    2653
2009  -0.490  5767    0.401   6618    -0.014  12385   -0.006   29050    2659
2010  -0.467  5528    0.388   6400    -0.008  11928   -0.003   29244    2666
2011  -0.449  5213    0.376   6091    -0.004  11304   -0.002   28670    2663
2012  -0.430  4816    0.353   5645    -0.007  10461   -0.003   28606    2634
2013  -0.403  4331    0.322   5253    -0.006  9584    -0.002   28247    2575
2014  -0.366  4032    0.295   4924    -0.002  8956    -0.001   27937    2525
2015  -0.354  3589    0.281   4386    -0.005  7975    -0.001   26151    2465
2016  -0.355  3279    0.278   4013    -0.007  7292    -0.002   24368    2199
2017  -0.611  86      0.406   332     0.197   418     0.167    494      494

Note: GHCN makes no adjustments for the past two years other than using TOBS corrections for USHCN data. A large number of stations are labeled a total 
failure by PHA. Over the passage of years eventually many of these failures are accepted as valid with some being adjusted and others simply passed 
along. By the time you get back to 1951, 48% of the data is adjusted down while 23% is adjusted up.

2016 had 29162 months at 2594 stations having at least one month of valid data in the qcu. That is after cleaning errors.


2016 had 29162 months at 2594 stations having at least one month of valid data in the qcu. That is after cleaning errors.

45 comments:

UnknownFebruary 9, 2017 at 11:35 PM
The problem seems to be somewhat congruent to multiple sequence alignment in bioinformatics, which suffers the same sort of issues - the area of the energy landscape close to the global minimum is very flat and has lots of local minima.

The only real solution I know of is to produce an ensemble of outputs (or even better, represent the entire energy landscape). That however means long calculations and vast downloads, which we know from experience (e.g. with the HadCRUT4 ensemble) everyone will ignore anyway. I believe GHCN do have an ensemble, but I've never heard of anyone using it.

I suspect that the ensemble results are very stable over time, and that the flutter essentially arises from the adjustments crudely sampling within the ensemble space. It's an interesting area for further study though. If I had time I'd start by running the current data through PCA, and then truncating months of the end to see how things change. Then I'd try adding noise to the current data and see if that produces the same kind of spread.
ReplyDelete
Replies
Victor VenemaFebruary 10, 2017 at 5:14 AM
That the homogenized data for some stations flatters is in itself okay. Also if the algorithm works right that will unavoidably happen.

Every day new data comes in. That makes it possible to see new inhomogeneities. These breaks are detected using the statistical test SNHT. Sometimes breaks will be seen as statistically significant that with one more data point just do not cross the significance threshold and with again new data will be significant again. And so on. One significant break can also influence whether other breaks in the pair are detectable.

After detecting the breaks in the pairs, these breaks are assigned to a specific station (called attribution in the paper), whether a break is detected and the exact year in which it is detected will influence this attribution. If one station has a break that is near statistically significant, this could thus even influence the results for its surrounding stations.

The influence of inhomogeneities is largest for stations and becomes less for networks, continents and the world. In the upcoming GHCNv4 homogenization will likely not change the global mean warming much any more.

Homogenization improves the data the most at the station level and smaller scales, but data at the station level is still highly uncertain. If these small scales are important to you, please contact your local national weather service, they know much better what happened to their network and their data will likely be more accurate that what we can do for a global dataset.

The pairwise homogenization algorithm is fully automatic. It is thus easy to run it every night and that gives the most accurate results. Last time I asked, but that is years ago, NOAA also actually ran the algorithm every night.
ReplyDelete
Replies
Bryan - oz4casterFebruary 10, 2017 at 7:22 AM
I'd be curious to see what homogenization does to USCRN data. I would expect any changes introduced to be an indication of potential error introduce by homogenization.
ReplyDelete
Replies
David YoungFebruary 11, 2017 at 3:47 AM
I guess the naive question is why doesn't NOAA do the hard grunt work of evaluating stations data on a case by case basis and carefully documenting the adjustments. Once past adjustments are assigned, they should be frozen for all future updates.

Wind tunnel tests are evaluated and data adjusted differently for each different test set up. Using an automated "algorithm" would be an inferior method. No honest specialist would endorse such a fluttering algorithm. The result is better data and a traceable case by case documentation.

The noise being randomly distributed for a couple of cases examined is not very convincing to me. NOAA is paid for by US taxpayers. They should prioritize a more defensible analysis of particularly US weather station data.
ReplyDelete
Replies
David YoungFebruary 11, 2017 at 3:49 AM
A more important question is why this flutter issue has not received significant attention in the literature. Perhaps its there and I'm unaware of it. Paul Matthews has documented that NOAA simply refused to reply or respond when the issue was pointed out to them multiple times.
ReplyDelete
Replies
Olof RFebruary 11, 2017 at 10:10 AM
Nice work Nick,
I think you should redo this exercise with GHCNv4.
I'll bet that the relative frequency and magnitude of the flutter will be much smaller in v4.

V4 does a much more sensible adjustment in Alice Springs (If we can accept that it discards all data before 1941, I dont know why, but I believe that the station moved from the town to the airport then)
https://www1.ncdc.noaa.gov/pub/data/ghcn/v4/beta/products/StationPlots/AS/ASN00015590/

I have seen that GHCN v3 can do strange things with remote lonely stations, for instance those in the high Arctic. I believe that GHCN v4 will be a general remedy for this kind of problems. If the lonely stations are supported by new neighbour stations, it will be easier for the PHA to "decide" if the temperature changes are real or not..
ReplyDelete
Replies
Nick StokesFebruary 11, 2017 at 10:50 AM
Bob Koss tried to post a comment, but ran into trouble. I have posted it as an appendix to the main post above, to preserve the format.
ReplyDelete
Replies
AnonymousFebruary 13, 2017 at 8:46 PM
Nick and Victor: When I look at BEST's plots of the difference between station data and the "regional expectation", there often seems to be a strong seasonal signal. Due to local environment, during summer a station may be warmer than average for the region and the opposite in winter. When a breakpoint detection algorithm is on the verge of reporting a shift to warmer readings, that shift is most likely to be detected in the summer. The following winter, there may be less confidence that a breakpoint has been detected. FWIW, this seems to be one mechanism that could cause "flutter" in the homogenized output from some stations.

Frank
ReplyDelete
Replies

Add comment

An interactive topic index for all Moyhu posts.
Latest Ice and Temperature data
Climate Data Portals
A gallery of Javascript-enhanced graphics
Temperature trend viewer
Climate Plotter V2
Google Maps and GHCN
WebGL map of past GHCN/SST station temperatures
WebGL map of GHCN/SST station temperature trends
HiRes NOAA OI SST with WebGL and Movie
Regional Hi-Res SST movies
WebGL Facility
TempLS Guide
More pages, and blog glossary

moyhu

Thursday, February 9, 2017

Flutter in GHCN V3 adjusted temperatures.

Flutter in GHCN V3 adjusted temperatures.

Analysis

Appendix - a comment from Bob Koss, which I am posting here to get readable format

45 comments:

Maintained Pages

Search This Blog

Recent Comments

Blogroll

Blog Archive

Translate

Resources

About Me

moyhu

Thursday, February 9, 2017

Flutter in GHCN V3 adjusted temperatures.

Flutter in GHCN V3 adjusted temperatures.

Analysis

Appendix - a comment from Bob Koss, which I am posting here to get readable format

45 comments:

Maintained Pages

Search This Blog

Recent Comments

Blogroll

Subscribe To

Blog Archive

Translate

Resources

About Me