cf-xarray icon indicating copy to clipboard operation
cf-xarray copied to clipboard

Cell Boundary aware operations

Open andersy005 opened this issue 4 years ago • 17 comments

For the last few months, I've been working on xgriddedaxis, a tool for working with one-dimensional axes with their respective cell boundaries information. xgriddedaxis was motivated by the fact that xarray is not aware of cell boundary variables when performing operations such as resampling along the time coordinate. The main objective of xgriddedaxis is to provide a set of utilities that enables fluid translation between data at different intervals while being aware of the cell boundary variables.

Is this something that falls within cf-xarray's scope? If so, I am happy to help out with the implementation for this in cf-xarray.

Ccing @kmpaul, @matt-long

andersy005 avatar Jun 01 '20 18:06 andersy005

This issue may overlap with #9, #8

andersy005 avatar Jun 01 '20 18:06 andersy005

Based on the examples in the README, it sounds like xgriddedaxis should be included in cf-xarray. It depends on the bounds attribute, which infers CF-compliance, and it supplies some excellent remapping utilities.

kmpaul avatar Jun 01 '20 21:06 kmpaul

@andersy005 I'm not sure if I am remembering this correctly, but my memory tells me that we envisioned that the xgriddedaxis would fit into the existing xarray workflow of dealing with resampled if and only if the data variables had a bounds attribute and corresponding variable. Correct? In that sense, then this seems to fit best into cf-xarray.

kmpaul avatar Jun 01 '20 23:06 kmpaul

  1. One solution here is to decode the bounds attribute to a Pandas IntervalIndex or a future xarray CellIndex. (https://github.com/pydata/xarray/issues/1475)
  2. It may be prudent to wait till xarray’s Index API is clear. There has been some discussion of allowing “regridders” to hook into xarray’s indexing functionality. (https://github.com/pydata/xarray/issues/486, https://github.com/pydata/xarray/issues/475)

xarray has CZI funding to make good progress on explicit indexes this year, so some answers to the above points should be available in the next few months.

However, should cf-xarray provide an interim solution? If so, I'd favour an explicit opt-in ds.cf.resample(T="D", weight_bounds=True).

dcherian avatar Jun 12 '20 18:06 dcherian

Personally, since I think we have an interim solution already, I like the idea of the ds.cf.resample interface.

kmpaul avatar Jun 15 '20 18:06 kmpaul

I agree.

But since all the other wrapped functions just rewrite the arguments, I think it would be good to make the use of bounds an explicit opt-in behaviour. This would also let you do ds.cf.resample(T="D") when the bounds are absent.

dcherian avatar Jun 15 '20 19:06 dcherian

Ah. Yes. Agree. 👍

kmpaul avatar Jun 15 '20 19:06 kmpaul

But since all the other wrapped functions just rewrite the arguments, I think it would be good to make the use of bounds an explicit opt-in behaviour. This would also let you do ds.cf.resample(T="D") when the bounds are absent.

Will this generate new bounds as well or will it just use existing bounds? In other words, with ds.cf.resample(time="D", weight_bounds=True), will the user get a new time coordinate with corresponding time_bounds?

andersy005 avatar Jun 17 '20 20:06 andersy005

New bounds would be a good idea IMO.

dcherian avatar Jun 17 '20 20:06 dcherian

This discussion also lines up with the "Climatological Statistics" section of the conventions: http://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#climatological-statistics

This seems like a useful feature under the ds.cf.clim namespace at least for simple reductions like cf.clim.mean(). But we shouldn't be reimplementing anything in xclim (https://xclim.readthedocs.io/en/stable/indicators.html)

dcherian avatar Jun 18 '20 17:06 dcherian

I'm concerned about colliding namespaces. Is it fair to say that we could implement functionality in xclim in a thin layer of cf-xarray, where it makes sense? That is, make xclim a dependency for some features of cf-xarray?

kmpaul avatar Jun 18 '20 19:06 kmpaul

As far as I can tell, xclim doesn't do climatologies so there should be no overlap or collision.

dcherian avatar Jun 18 '20 19:06 dcherian

Ok. I guess I need to look more closely at xclim and learn what it actually does. 😄

kmpaul avatar Jun 18 '20 19:06 kmpaul

xclim was initially designed to compute "climate indicators", e.g. cooling_degree_days, maximum_annual_precipitation, etc. Since then, we've added bias correction algorithms and some ensemble analysis functionalities, but there has been no effort invested so far on climatological means.

If cf-array implemented climatological operations that enforce CF-Conventions, this is something we would certainly be interested in integrating. See https://github.com/Ouranosinc/xclim/issues/74 for some related work that has stalled.

huard avatar Sep 02 '20 18:09 huard

@malmans2 what do you think about this issue?

dcherian avatar Feb 24 '21 16:02 dcherian

The code in xgriddedaxis is really small (!) so I think it's OK to copy it over.

I like the .cf.resample(time="M", weight_bounds=True) opt-in API Or we could do .cf.weighted_resample(time="M")

Under-the-hood, i think this should do

weights = cfxr.resample_weights(input, output, freq)
obj.weighted(weights).resample(time=freq)

which would require solving https://github.com/pydata/xarray/issues/3937 first. We'd also have to test out using sparse weights in the weighted operation.

We'd have to add a CFResample class that attaches new bounds after an operation like .mean

We can explore upstreaming the weighting during discussions of the new xarray indexing API.

dcherian avatar Feb 24 '21 16:02 dcherian

I've started here with groupby.weighted which will also allow .resample.weighted: https://github.com/pydata/xarray/pull/5480.

dcherian avatar Jun 17 '21 15:06 dcherian