Add temporal bounds and center times for `group_average()` API
Description
- Closes #565
Checklist
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my own code
- [ ] My changes generate no new warnings
- [ ] Any dependent changes have been merged and published in downstream modules
If applicable:
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass with my changes (locally and CI/CD build)
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)
@pochedls and @oliviermarti this PR should address this GH issue (same as this comment from @oliviermarti).
If you can check this branch out and try it that'd be great.
import numpy as np
import pandas as pd
import xcdat as xc
# Create a dummy xarray dataset
time = pd.date_range("2000-01-01", "2001-12-31", freq="D")
data = np.random.rand(len(time))
dummy_ds = xr.Dataset({"dummy_var": (["time"], data)}, coords={"time": time})
dummy_ds["time"].encoding["calendar"] = "standard"
dummy_ds = dummy_ds.bounds.add_missing_bounds(axes=["T"])
ds_avg = dummy_ds.temporal.group_average("dummy_var", freq="month")
Before -- no time_bnds and time starts at the beginning of the averaged period
ds_avg.time
<xarray.DataArray 'time' (time: 24)> Size: 192B
array([cftime.DatetimeGregorian(2000, 1, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 2, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 3, 1, 0, 0, 0, 0, has_year_zero=False),
...
dtype=object)
Coordinates:
* time (time) object 192B 2000-01-01 00:00:00 ... 2001-12-01 00:00:00
Attributes:
bounds: time_bnds
Result -- time is now centered using time_bnds
ds_avg.time
array([cftime.DatetimeGregorian(2000, 1, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 2, 15, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 3, 16, 12, 0, 0, 0, has_year_zero=False),
...
dtype=object)
ds_avg.time_bnds
array([[cftime.DatetimeGregorian(2000, 1, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 2, 1, 0, 0, 0, 0, has_year_zero=False)],
[cftime.DatetimeGregorian(2000, 2, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 3, 1, 0, 0, 0, 0, has_year_zero=False)],
[cftime.DatetimeGregorian(2000, 3, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 4, 1, 0, 0, 0, 0, has_year_zero=False)],
...
dtype=object)
@tomvothecoder – this is great – thanks for pushing this forward so quickly.
I think add_missing_bounds will work in most cases, but will fail for seasonal averages (and definitely custom seasons).
I think we'll need to collect the bounds for each group, (e.g., group_bounds_array = [("2000-01-01 00:00", "2000-01-02 00:00"), ("2000-01-02 00:00", "2000-01-03 00:00"), ..., ("2000-01-31 00:00", "2000-02-01 00:00")] and then take the min of the lower bound and the max of the upper bound (i.e., group_bnd = [np.min(groups_bound_array[:, 0]), np.max(group_bounds_array[:, 1])].
I think we'll need to collect the bounds for each group, (e.g.,
group_bounds_array = [("2000-01-01 00:00", "2000-01-02 00:00"), ("2000-01-02 00:00", "2000-01-03 00:00"), ..., ("2000-01-31 00:00", "2000-02-01 00:00")]and then take the min of the lower bound and the max of the upper bound (i.e.,group_bnd = [np.min(groups_bound_array[:, 0]), np.max(group_bounds_array[:, 1])]
This makes sense to me. I'll think of an algorithm.