Tuesday, June 13, 2017

Integrating temperature on sparse subgrids

I've been intermittently commenting on a thread on the long-quiet Climate Audit site. Nic Lewis was showing some interesting analysis on the effect of interpolation length in GISS, using the Python version of GISS code that he has running. So the talk turned to numerical integration, with the usual grumblers saying that it is all too complicated to be done by any but a trusted few (who actually don't seem to know how it is done). Never enough data etc.

So Olof chipped in with an interesting observation that with the published UAH 2.5x2.5° grid data (lower troposphere), an 18 point subset was sufficient to give quite good results. I must say that I was surprised at so few, but he gave this convincing plot:

He made it last year, so it runs to 2015. There was much scepticism there, and some aspersions, so I set out to emulate it, and of course, it was right. My plots and code are here, and the graph alone is here.

So I wondered how this would work with GISS. It isn't as smooth as UAH, and the 250 km less smooth than 1200km interpolation. So while 18 nodes (6x3) isn't quite enough, 108 nodes (12x9) is pretty good. Here are the plots:

I should add that this is the very simplest grid integration, with no use of enlightened infilling, which would help considerably. The code is here.

Of course, when you look at a statistic over a longer period, even this small noise fades. Here are the GISS trends over 50 years:

1967-2016 trend C/CenFull mesh 108 points 18 points

This is a somewhat different problem from my intermittent search for a 60-station subset. There has already been smoothing in gridding. But it shows that the spatial and temporal fluctuations that we focus on in individual maps are much diminished when aggregated over time or space.


  1. Nick's link to the ClimateAudit article and the following discussion does not work for me. If anyone want to read the discussion between, Nic, Nick ,Olof, Jerry et al. on this interesting topic, you find it here: https://climateaudit.org/2017/05/18/how-dependent-are-gistemp-trends-on-the-gridding-radius-used/#comments (until the original link has been fixed)

    1. Thanks, Erik,
      I'm not sure how I messed that up, but I hope it is fixed now.

  2. Good grief, the Pat and Jerry show... where's the manuscript? Lol.

  3. Although those commenters at ClimateAudit appear well-credentialled, it also appears they can't tell the difference between solving partial differential equations and doing spatial interpolation. Why is that? Are they one-trick ponies that are so involved in their own research topic that they have lost the ability to do any other flavor of applied math? Are they simply unable to intuit anything?

    The other explanation is that it may be impossible to make any headway with them, since they operate under a political agenda, not a scientific one. And it may be that they are just toying with you. In that case what's more important is that they convince their followers with their sophistry and not you.

    The third possibility is that they are suffering from emeritus-syndrome, "going emeritus".

    At some point it ceases to be fun as it's the equivalent of punching down.

  4. Hello Nick, Olof & alii,

    I don‘t know if this comment will be published I sent at Climate Audit, so I send a copy to moyhu in addition :-)


    This comment does not primarily focus on Nick Lewis' head post.

    After having read the entire sequence of Gerald Browning's reactions to Olof R’s 18 point sparse data experiment with UAH's 2.5° grid dataset, I'm not quite sure wether or not he really understood Olof’s intention.

    And maybe he still did not exactly found out what was its real origin (at least before Nick Stokes published a piece of R code referring directly to the UAH grid source).

    Olof did not use any model nor did he smooth anything. Of course: he used yearly data, what led to unnecessary criticism; that's the reason why I will use monthly time series instead.

    The UAH 2.5° grid dataset for the lower troposphere everybody can find in the files

    The same structure exist for other atmospheric layers.

    These datasets consist, for every year, of a 12 month sequence of 144 x 72 = 10,368 grid cells representing the entire planet.

    But UAH in fact publishes valuable data for the range 82.5S – 82.5 N only: the three latitude stripes near each pole do not contain anything useful. Thus the interesting area is here restricted to 144 x 66 stripes = 9,504 grid cells.

    As I saw Olof’s 18 point example for the first time at WUWT:


    I was very impressed, and wanted to do a similar job, but by directly processing the UAH data instead of using KNMI's Explorer.

    And instead of repeating Olof's experiment with 18 evenly distributed points, I preferred to take 32, 128 and 512 UAH grid cells, and compared their monthly time series with what I obtained out of the entire 9,504 cell set.

    What of course is nearly equivalent to UAH's Globe data you find in column 3 of


    The linear trend difference between the two is below 0.01 °C / decade, so my UAH grid data processing can't be that wrong.

    Below you see a chart (made using good ol' Excel) comparing the plots obtained from the monthly data of the different subsamplings (32, 128, 512 grid cells) with that of the entire 9,504 cell set, for the period dec 1978 to dec 2016:


    It is amazing to see how good 512 cells fit to the whole grid (though they sum up to not much more than laughable 5% of it); and even the approximations using only 32 or 128 cells are already quite impressive as well.

    Thus Olof's UAH and Nick's GISS subsampling approaches are imho correct and show that the real consequences of the planet's surface measurement undersampling are somewhat overestimated.


    Let me show this with a similar test, this time using GHCN unadjusted data, for which both deviations from the mean, and above all the linear estimate, are way above GISS' data.


    In red you see a global average of all 7,280 GHCN land stations worldwide; in green a subsampling generated by allowing only one (randomly chosen) GHCN station per 5° grid cell to contribute to the average.

    Apart from the fact that here as well, the subsampling's linear trend for 1880-2016 differs from the complete set by less than 0.01 °C / decade, their 60 month running means are amazingly similar.

    And as the oceanic surfaces not only show cooler trends than land surfaces but are in addition more homogen, adding ERSST data to GHCN’s would make the two plots certainly even more similar.

    J.-P. Dehottay alias Bindidon

  5. Bindidon - come on, you've used some sly trick, right?

    1. I'm so terribly sorry to disappoint you, JCH! I'm all but a fan of trick zones of any kind:-)

  6. Nice with all interest in these statistical exercises...

    As far as I understand the satellite readings are binned in 2.5 degree cells, with no additional smoothing of data, so the gridcells are independent of each other. It's different with the Gistemp grid, the 1200 km interpolation smooths the field, ERSST data is interpolated and smoothed, and the GHCN adjustment also results in some smoothing between nearby stations. But I still believe that there is more noise in surface data than in troposphere data.

    Anyway, I have tried to take this 18 sample thing one step further towards pure and independent data. In the following chart I have used unadjusted station data from GHCN, picking one station near each of those 18 gridcells, with as little data gaps as possible during the recent 60 years (further back there is a lack of antarctic data ):

    Well, there is some noise compared to gistemp loti, but the global warming signal is quite clear..

    1. Thanks Olof, good idea!

      Could you publish the list of the 18 stations you selected (the 11-digit station ids will suffice of course) ?

    2. Sure, here are the WMO station codes given by KNMI Climate explorer (the one with a dot is US "near WMO")

      You may notice that "near" (the 18 points) can be a little bit stretched out in the empty southern oceans..

      Also, I just saw that some noise can disappear with a more rigorous treatment of missing data. The link above is live so the chart may change slightly when/if I have time...

    3. Thanks again Olof.

      ... and the GHCN adjustment also results in some smoothing between nearby stations.

      You are right, but when considering the chart below, you certainly will agree that the adjustments are very small:


      { There will be some bias in your mind: I don't perform any latitude weighting for grids because for me, temperatures are temperatures, regardless the latitude of their measurement. I would understand cosine weighting if we were comparing solar irradiance per grid cell or the like. }

      There is huge polemic sustained by interested people about the differences between the two GHCN variants.

      But these people solely talk about GHCN stations whose trend for the adjusted record surpasses that of the unadjusted one, but never about the inverse:


    4. Bindi, Nice work with the adjusted/unadjusted comparison. The Berkeley earth team have made similar conclusions (and produced a raw land-only dataset):


      The effects of adjustments are small and not what sceptics believe. The mainly natural warming before 1950 has been increased by adjustments, whereas the AGW after 1950 has been reduced by adjustments.

      This also disproves the common claim that adjustments are done to fit observations with models, because the raw dataset actually fits slightly better with models than the adjusted

      Regarding the station list above, there has been an accidental rounding error with the dot-numbered US station. The right station code is 351.765

  7. Hello Nick, many thanks for your pretty good job of extending Olofs idea from UAH to GISS.

    GISS grid data isn't available as text file (like is e.g. JMA's), and I still don't want to enter the netcdf level; thus your effort is twice welcome.

    But e.g. at WUWT I had to experience, when showing little extrapolations of Olofs idea, some remarks like "That what we know is partly redundant does not mean that interpolating what we don't know is correct".

    Maybe you find some little time in teaching why latitude weighting has to be used when processing gridded data... I of course understand the principle, but not its necessity.

    And anyway I'm wondering why my averaging of UAH or JMA grid data fits so exactly to their own averaged output, even though my averaging does not contain any latitude cosine weighting :-)

    1. "even though my averaging does not contain any latitude cosine weighting"
      Area weighting is necessary in the case where a region is behaving differently to the rest. Then you need to make sure that it makes a proportionate change to the average. If a region is incorrectly weighted, but behaving like the average, then the weighting doesn't matter.

      That is why properly weighting the Arctic is important (per Cowtan and Way). Africa is also poorly covered, and wrongly weighted in HADCRUT, but that has little systematic effect, because it is not doing anything different.