Friday, June 26, 2015

TempLS V3 - part 2

In my previous post, I began a three part description of the new vwrsion of TempLS. That part was devoted to the collection of land and SST data into a temperature data array.

The next part is about forming the other basic array for the calculation - the weighting array, with one entry for each temperature measurement. I've described in various posts (best summarized here) the principles involved. A spatial average (for a period if time, usually month) requires numerical spatial integration. The surface is divided into areas where the temperature anomaly can be estimated from data. If the subdivisions do not cover the whole area, the rest will be assumed (by the arithmetic) to have the average anomaly of the set that does have data. The integral is formed by adding the products of the estimates and their areas. That makes a weighted sum of the data.

Traditionally, this was done with a lat/lon grid, often 5°×5°. An average is formed for each cell with data, in each month, and the averages are summed weighted by the cell area, which is proportional to the cosine of latitude. That is not usually presented as an overall weighted average, but it is. HADCRUT and NOAA use that approach. TempLS originally did that, so that the weight for each station was the cell area divided by the number of stations reporting that month in that cell. Effectively, the stations are allocated a share of the area in the cell.

Of course, this leaves some cells with no data. Cowtan and Way explored methods of interpolating these areas (discussed here). It makes a difference, because, as said, if nothing is done, those areas will be assigned the global average behaviour, from which they may in fact deviate.

Recently, I have been using in preference an irregular triangular mesh (as shown right), of which the stations are the nodes. The weighting is made by assigning to each station a third of the area of each triangle that they are part of (see here). This has complete area coverage. I have continued to publish results from both the grid and mesh weighting.

The code below shows how to form both kinds of weights. Note that the weighting does not use the actual measurements, only their existence. Stations with no result in a month get zero weight. There is no need for the weights to add to 1, because any weighted sum is always divided by a normalising sum.


The stage begins by loading saved data from the previous part. At this stage the land and SST data are concatenated, and a combined inventory made, from which the lat/lon of each station will be needed. R[] are filenames of saved data, consistent with the job control.
if(do[3]){ #stage 3 -  wts from supplied x
 load(R[2])  # ERSST x,iv
 x0=x;i0=cbind(iv,NA,NA,NA,NA); rad=pi/180;
 load(R[1])  # GHCN x,iv
 colnames(i0)=colnames(iv)
 x=rbind(x,x0);iv=rbind(iv,i0)  # combine

Then the choice is made (from the job, third char) between grid and mesh. We'll start with grid:
 if(kind[3]=="g"){  #Grid
  dl=180/36 # 36*72 grid for wts
  jv=as.matrix(iv[,1:2])%/%dl # latlon integer number for each cell
  y=cos((jv[,1]+.5)*dl*rad)  #trig weighting for area each cell
  kv=drop(jv%*%c(72,1)+1297)  # Give each cell a unique number
  w=array(0,dim(x))
  for(i in 1:ncol(x)){  # over months
   j=which(!is.na(x[,i])) #  stations with data
   s=1/table(kv[j]) # stations in each cell
   u=as.numeric(names(s)) #cell numbers
   k1=match(kv[j],u) # which cell numbers belong to each stat
   w[j,i]=y[j]*s[k1]
  }
 }else{ # Mesh

It's just a matter of assigning a unique number to each cell, and a vector with an entry for each of those numbers, and going through each month/year, counting. In R it is important to use fairly coarse-grained operations, because each command has an interpretive overhead.

Then continuing with the "else" part, for mesh weighting
  require("geometry")
  require("alphahull")
  a=pc("w_",R[3])
  if(file.exists(a)){load(a)}else{w=array(0,dim(x))}  # old mesh
  a=cos(cbind(iv[,1:2],90-iv[,1:2])*rad)
  z=cbind(a[,1]*a[,c(2,4)],a[,3]) # to 3D coords
  if(!exists("w_yr"))w_yr=yr
  w=w[,mth(w_yr)%in%mth(yr)]
  if(sum(dim(w)==dim(x))<2){w=array(0,dim(x));}
  for(ii in 1:ncol(x)){
   o=!is.na(x[,ii]) # stations with data
   no=sum(o); if(no<9)next;  # skip months at the end
   if(sum(xor(o,(w[,ii]>0)))==0)next; # w slots match x, so use old mesh
   y=z[o,]
   m=convhulln(y) # convex hull (tri mesh)
   A=0
   B=cbind(y[m[,1],],y[m[,2],],y[m[,3],])
   for(j in 0:2){  # determinants for areas
    k=(c(0:2,0,2,1)+j)%%3+c(1,4,7)
    A=A+B[,k[1]]*B[,k[2]]*B[,k[3]]-B[,k[4]]*B[,k[5]]*B[,k[6]]
   }
   A=abs(A)  # triangles from convhulln have varied orientation
   n=cbind(c(m),rep(A,3)) # long list with areas matched to nodes
   k=which(o)
   w[!o,ii]=0
   w[o,ii]=1e-9  # to mark stations that missed wt but have x entries
   for(jj in 1:99){  # Adding areas for nodes
    i=match(1:sum(o),n[,1]) # find first matches in mesh
    q=which(!is.na(i))
    w[k[q],ii]=w[k[q],ii]+n[i[q],2]  # add in area
    n=rbind(n[-i[q],]) # take out matches
    if(length(n)<1)break;
   }
   print(c(ii,nrow(m),sum(w[,ii])/pi/24))
  } # end mesh months loop

This is more complicated. The mesh is actually the convex hull of the points on the sphere, which the R package alphahull calculates via convhulln(). This is quite time consuming (about 20 mins on my PC), so I save the results on a "w_.." file. So it reads the most recent, if any, and for months where the pattern of months matches, it uses the stored, which is much faster. Also the lat/lon of stations are converted to a unit sphere.

So we enter the ii loop over months with x and a w array waiting to be filled. If stored w doesn't match, the convex hull is formed. Then A, the areas are found. Note that convhulln does not consistently orient triangles, so the abs() is needed. The areas are in fact determinants, again calculated in a way to minimise looping.

Then it's just a matter of assigning area sums to nodes. This isn't so easy without fine looping, since each column of the m array of triangles has multiple occurrences of nodes. So in the jj loop, I identify a unique set of occurrences, add those areas to the node score, remove them, and repeat until all areas have been allocated. The last part just saves the newly calculated weights for future reference.

The last code in this step just saves the result for the calculation step 4.
 save(x,w,iv,file=R[3])  
}#  end step 3

Here is the complete code for this stage:
if(do[3]){ #stage 3 -  wts from supplied x
 load(R[2])  # ERSST x,iv
 x0=x;i0=cbind(iv,NA,NA,NA,NA); rad=pi/180;
 load(R[1])  # GHCN x,iv
 colnames(i0)=colnames(iv)
 x=rbind(x,x0);iv=rbind(iv,i0)  # combine
 if(kind[3]=="g"){  #Grid
  dl=180/36 # 36*72 grid for wts
  jv=as.matrix(iv[,1:2])%/%dl # latlon integer number for each cell
  y=cos((jv[,1]+.5)*dl*rad)  #trig weighting for area each cell
  kv=drop(jv%*%c(72,1)+1297)  # Give each cell a unique number
  w=array(0,dim(x))
  for(i in 1:ncol(x)){  # over months
   j=which(!is.na(x[,i])) #  stations with data
   s=1/table(kv[j]) # stats in each cell
   u=as.numeric(names(s)) #cell numbers
   k1=match(kv[j],u) # which cell numbers belong to each stat
   w[j,i]=y[j]*s[k1]
  }
 }else{ # Mesh
  require("geometry")
  require("alphahull")
  a=pc("w_",R[3])
  if(file.exists(a)){load(a)}else{w=array(0,dim(x))}  # old mesh
  a=cos(cbind(iv[,1:2],90-iv[,1:2])*rad)
  z=cbind(a[,1]*a[,c(2,4)],a[,3]) # to 3D coords
  if(!exists("w_yr"))w_yr=yr
  w=w[,mth(w_yr)%in%mth(yr)]
  if(sum(dim(w)==dim(x))<2){w=array(0,dim(x)); }
  for(ii in 1:ncol(x)){
   o=!is.na(x[,ii]) # stats with data
   no=sum(o); if(no<9)next;  # skip months at the end
   if(sum(xor(o,(w[,ii]>0)))==0)next; # w slots match x, so use old mesh
   y=z[o,]
   m=convhulln(y) # convex hull (tri mesh)
   A=0
   B=cbind(y[m[,1],],y[m[,2],],y[m[,3],])
   for(j in 0:2){  # determinants for areas
    k=(c(0:2,0,2,1)+j)%%3+c(1,4,7)
    A=A+B[,k[1]]*B[,k[2]]*B[,k[3]]-B[,k[4]]*B[,k[5]]*B[,k[6]]
   }
   A=abs(A)  # triangles from convhulln have varied orientation
   n=cbind(c(m),rep(A,3)) # long list with areas matched to nodes
   k=which(o)
   w[!o,ii]=0
   w[o,ii]=1e-9  # to mark stations that missed wt but have x entries
   for(jj in 1:99){  # Adding areas for nodes
    i=match(1:sum(o),n[,1]) # find first matches in mesh
    q=which(!is.na(i))
    w[k[q],ii]=w[k[q],ii]+n[i[q],2]  # add in area
    n=rbind(n[-i[q],]) # take out matches
    if(length(n)<1)break;
   }
   print(c(ii,nrow(m),sum(w[,ii])/pi/24))
  } # end mesh months loop
  f=paste("w_",R[3],sep="") # for saving the wts if wanted
  w_yr=yr # save yrs to match when recalled
  if(!file.exists(f))save(w_yr,w,file=f)
 }# end mesh/grid
 save(x,w,iv,file=R[3])  
}#  end step 3






7 comments:

  1. Replies
    1. Thanks, Steven. It's not so very different from the version you worked with in RGHCN. Just a bit better organised, and with a more intuitive solver (next episode).

      Delete
  2. "convex hull"? Are not the triangles planar?

    ReplyDelete
    Replies
    1. Yes, they are. A tetrahedron is a convex body. A convex hull, in 2D, is what you would get by drawing a string tightly around it. Dips are smoothed with a straight line section. In 3D, it's a plane section. A tetrahedron is the convex hull of its 4 vertices.

      Delete
    2. Well, a tet has 6 edges, but only 4 vertices and faces.

      Delete