cf-xarray
cf-xarray copied to clipboard
Cell Boundary aware operations
For the last few months, I've been working on xgriddedaxis, a tool for working with one-dimensional axes with their respective cell boundaries information. xgriddedaxis was motivated by the fact that xarray is not aware of cell boundary variables when performing operations such as resampling along the time
coordinate. The main objective of xgriddedaxis is to provide a set of utilities that enables fluid translation between data at different intervals while being aware of the cell boundary variables.
Is this something that falls within cf-xarray
's scope? If so, I am happy to help out with the implementation for this in cf-xarray
.
Ccing @kmpaul, @matt-long
This issue may overlap with #9, #8
Based on the examples in the README, it sounds like xgriddedaxis
should be included in cf-xarray
. It depends on the bounds
attribute, which infers CF-compliance, and it supplies some excellent remapping utilities.
@andersy005 I'm not sure if I am remembering this correctly, but my memory tells me that we envisioned that the xgriddedaxis
would fit into the existing xarray
workflow of dealing with resampled if and only if the data variables had a bounds
attribute and corresponding variable. Correct? In that sense, then this seems to fit best into cf-xarray
.
- One solution here is to decode the bounds attribute to a Pandas
IntervalIndex
or a future xarrayCellIndex
. (https://github.com/pydata/xarray/issues/1475) - It may be prudent to wait till xarray’s Index API is clear. There has been some discussion of allowing “regridders” to hook into xarray’s indexing functionality. (https://github.com/pydata/xarray/issues/486, https://github.com/pydata/xarray/issues/475)
xarray has CZI funding to make good progress on explicit indexes this year, so some answers to the above points should be available in the next few months.
However, should cf-xarray provide an interim solution? If so, I'd favour an explicit opt-in ds.cf.resample(T="D", weight_bounds=True)
.
Personally, since I think we have an interim solution already, I like the idea of the ds.cf.resample
interface.
I agree.
But since all the other wrapped functions just rewrite the arguments, I think it would be good to make the use of bounds
an explicit opt-in behaviour. This would also let you do ds.cf.resample(T="D")
when the bounds
are absent.
Ah. Yes. Agree. 👍
But since all the other wrapped functions just rewrite the arguments, I think it would be good to make the use of bounds an explicit opt-in behaviour. This would also let you do ds.cf.resample(T="D") when the bounds are absent.
Will this generate new bounds as well or will it just use existing bounds? In other words, with ds.cf.resample(time="D", weight_bounds=True)
, will the user get a new time coordinate with corresponding time_bounds
?
New bounds would be a good idea IMO.
This discussion also lines up with the "Climatological Statistics" section of the conventions: http://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#climatological-statistics
This seems like a useful feature under the ds.cf.clim
namespace at least for simple reductions like cf.clim.mean()
. But we shouldn't be reimplementing anything in xclim (https://xclim.readthedocs.io/en/stable/indicators.html)
I'm concerned about colliding namespaces. Is it fair to say that we could implement functionality in xclim
in a thin layer of cf-xarray
, where it makes sense? That is, make xclim
a dependency for some features of cf-xarray
?
As far as I can tell, xclim
doesn't do climatologies so there should be no overlap or collision.
Ok. I guess I need to look more closely at xclim
and learn what it actually does. 😄
xclim
was initially designed to compute "climate indicators", e.g. cooling_degree_days, maximum_annual_precipitation, etc. Since then, we've added bias correction algorithms and some ensemble analysis functionalities, but there has been no effort invested so far on climatological means.
If cf-array implemented climatological operations that enforce CF-Conventions, this is something we would certainly be interested in integrating. See https://github.com/Ouranosinc/xclim/issues/74 for some related work that has stalled.
@malmans2 what do you think about this issue?
The code in xgriddedaxis is really small (!) so I think it's OK to copy it over.
I like the .cf.resample(time="M", weight_bounds=True)
opt-in API Or we could do .cf.weighted_resample(time="M")
Under-the-hood, i think this should do
weights = cfxr.resample_weights(input, output, freq)
obj.weighted(weights).resample(time=freq)
which would require solving https://github.com/pydata/xarray/issues/3937 first. We'd also have to test out using sparse
weights in the weighted operation.
We'd have to add a CFResample
class that attaches new bounds after an operation like .mean
We can explore upstreaming the weighting during discussions of the new xarray indexing API.
I've started here with groupby.weighted
which will also allow .resample.weighted
: https://github.com/pydata/xarray/pull/5480.