But the meme persists, and metrology handbooks get quoted - here a JGCM guide. But the theory quoted is for a single measurement, where repeated measurements can't overcome lack of resolution. But that isn't what is happening in climate. Instead a whole lot of different measurements are averaged.
Of course, averaging does improve accuracy. That's why people incur cost to obtain large samples. In this post, I'll follow my comment at WUWT by taking 13 months of recent daily max in Melbourne, given by BoM to 1 decimal place, and show that if you round off that decimal, emulating a thermometer reading to nearest degree, the difference to the monthly average is only of order 0.05°C; far less than the reduction in resolution. But first, I'll outline some of the theory.
Law of Large NumbersThis goes back to Bernoulli. There was much confusion at WUWT with the central limit theorem, which is not at all the same. The Law of Large Numbers (LoLN) deals with convergence of a sample mean to a population mean with larger samples (lots of formulations) whereas the CLT makes the more interesting claim that the sample mean, as a random variable itself, tends toward a normal distribution, even though the individual samples may not have been normally distributed. There are of course caveats.
The LoLN is what is needed here, and at WUWT a somewhat informal Wiki statement was mentioned: "The average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed." The author (whose comment oddly disappeared) was reproved by Willis, dissing wiki and preferring: "The law of large numbers formulated in modern mathematical language reads as follows: assume that X1, X2, . . . is a sequence of uncorrelated and identically distributed random variables having finite mean μ …" and emphasising uncorrelated, iid etc.
The general idea of LoLN seems simple nowadays. If you add two independent random variables, the variance of the sum is the sum of the variances (subject to conditions like that they actually do have variances, but not requiring normality or identical distributions). If you have a set of independent random variables εi, consider a weighted average
A = Σ wiεi, with Σ wi = 1
Scaling can be absorbed in the weights, so they might as well be unit variables. Then the variance of A is Σ w²i. If it is a simple mean of N variables, w=1/N and the sum is 1/N. But if not, or if the variables have different variance, the convergence of the mean is still just a property of that diminishing sum.
What about correlation? If the unit variables have a correlation matrix K, then the combined variance is Σ wiKijwj. Does that converge? Well, it depends on K. If its coefficients do not tend to zero away from the diagonal, it may not. Again if w are uniform 1/N, the sum will be over all N² coefficients. But usually correlation does diminish as the variables become more separated in time or space.
I've included this to show where LoLN comes from, and that lack of iid is not a show stopper.
ResolutionTo be specific, suppose we have a thermometer read to an accuracy of 1°C, and a succession of temperatures T are coming in with a spread much larger than 1. Suppose we actually know the T values, but they are then read to resolution - ie rounded.
This is equivalent to displacing each reading Ti by an amount εi up to 0.5°C to nearest integer. That JCGM guide puts it thus (via Pat Frank at WUWT):
"If the resolution of the indicating device is δx, the value of the stimulus that produces a given indication X can lie with equal probability anywhere in the interval X − δx/2 to X + δx/2. The stimulus is thus described by a rectangular probability distribution of width δx with variance u^2 = (δx)^2/12, implying a standard uncertainty of u = 0.29δx for any indication."
So the cost to accuracy of the mean is a mean of those variables. It is very reasonable to assume them independent. Although temperatures themselves may be correlated, the fractional parts will be much less so, if the assumption that the resolution is well finer than the total temperature range holds. The distributions are uniform, so the standard error of the mean of N such is sqrt(1/12/N). As such, it tends to zero with large N. That is, the mean discrepancy between rounded and exact, Σ εi/N, behaves like sqrt(1/12/N).
You may say, what if the rounding isn't perfect? What if, say, .4 is sometimes rounded up instead of down. That just changed the uniform distribution to something similar with a slightly different variance.
Example - Melbourne maxima.
On pages like this, BoM shows the daily max for each recent month in Melbourne, to one decimal place. I have placed here a zipfile which contains a RData file (to load in R) called melb12.sav, which has a list of dataframes with full data for those months. There is also a file called melb13.csv, which has just the maximum temperatures that were used in this test. Here is last month (Mar):
33.7 34.7 23.9 33.0 23.7 25.2 24.9 38.9 28.5 22.1 26.1 22.3 23.2 21.3 26.8 31.4 32.5 19.5 18.8 23.3 23.5 24.3 28.8 21.2 20.4 20.2 19.9 19.2 17.9 18.7 22.7
Suppose we had a thermometer reading to only 1°C - so all these were rounded, as in the JCGM description. For the last 13 months, here are the means for the BoM (1 dp) and for that thermometer:
Mar Apr May Jun Jul Aug Sep Oct Nov dec Jan Feb Mar 1 dp: 22.72 19.24 17.13 14.43 13.29 13.85 17.26 24.33 22.73 27.45 25.98 25.1 24.86 0 dp: 22.77 19.27 17.13 14.37 13.29 13.84 17.33 24.35 22.67 27.48 26 25.17 24.84 diff: 0.05 0.03 0 -0.06 0 -0.01 0.08 0.03 -0.06 0.03 0.02 0.08 -0.02
The middle row, measured by day to 1°C, has a far more accurate mean than that resolution. As a check, the sd of the difference (bottom row) is expected from above to be sqrt(1/12/31) (slight approx for days in month), which is 0.052. The sd of the diffs shown is 0.045. The monthly average at 1 C resolution is accurate to about 0.05°C.