xcdat
xcdat copied to clipboard
Add support for custom seasons spanning calendar years
Description
- Closes #416
- Closes #684
TODO:
- [x] Support custom seasons that span calendar years (
_shift_spanning_months()
)- Requires detecting order of the months in a season. Currently, order does not matter.
- For example, for
custom_season = ["Nov", "Dec", "Jan", "Feb", "Mar"]
:-
["Nov", "Dec"]
are from the previous year since they are listed before"Jan"
-
["Jan", "Feb", "Mar"]
are from the current year
-
- We can potentially extend
_shift_decembers()
to shift other months too. This method shifts the previous year December to the current year so xarray can properly group "DJF" seasons spanning calendar years.
- [x] Detect and drop incomplete seasons (
_drop_incomplete_seasons()
)- Right now xCDAT only detects incomplete "DJF" seasons with
_drop_incomplete_djf()
- Replace boolean config
drop_incomplete_djf
withdrop_incomplete_season
- A possible solution for detecting incomplete seasons is to check if a season has all of the required months. If the count of months for that season does not match the expected count, then drop that season.
- Right now xCDAT only detects incomplete "DJF" seasons with
- [x] Remove requirement for all 12 months to be included in a custom season
Checklist
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my own code
- [ ] My changes generate no new warnings
- [ ] Any dependent changes have been merged and published in downstream modules
If applicable:
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass with my changes (locally and CI/CD build)
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)
Example result of _drop_incomplete_seasons()
:
# Before dropping
# -----------------
# 2000-1, 2000-2, and 2001-12 months in incomplete "DJF" seasons" so they are dropped
ds.time
<xarray.DataArray 'time' (time: 15)>
array([cftime.DatetimeGregorian(2000, 1, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 2, 15, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 3, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 4, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 5, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 6, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 7, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 8, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 9, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 10, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 11, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 12, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2001, 1, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2001, 2, 15, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2001, 12, 16, 12, 0, 0, 0, has_year_zero=False)],
dtype=object)
Coordinates:
* time (time) object 2000-01-16 12:00:00 ... 2001-12-16 12:00:00
Attributes:
axis: T
long_name: time
standard_name: time
bounds: time_bnds
# After dropping
# -----------------
ds_new.time
<xarray.DataArray 'time' (time: 12)>
array([cftime.DatetimeGregorian(2000, 3, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 4, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 5, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 6, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 7, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 8, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 9, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 10, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 11, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 12, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2001, 1, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2001, 2, 15, 0, 0, 0, 0, has_year_zero=False)],
dtype=object)
Coordinates:
* time (time) object 2000-03-16 12:00:00 ... 2001-02-15 00:00:00
Attributes:
axis: T
long_name: time
standard_name: time
bounds: time_bnds
Hey @lee1043, this PR seemed to be mostly done when I stopped working on it last year. I just had to fix a few things and update the tests.
Would you like to check out this branch to test it out on real data? Also a code review would be appreciated.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 100.00%. Comparing base (
e3dda64
) to head (1a91228
).
Additional details and impacted files
@@ Coverage Diff @@
## main #423 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 15 15
Lines 1546 1579 +33
=========================================
+ Hits 1546 1579 +33
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@tomvothecoder sure, I will test it out and review. Thank you for the update!
@tomvothecoder Can this be considered for v0.7.0?
@tomvothecoder Can this be considered for v0.7.0?
This PR still needs thorough review before I'm confident in merging it. I'll probably tag Steve at some point.
The release after v0.7.0 is more realistic and reasonable. We can always initiate a new release for this feature whenever it is merged.
@tomvothecoder Can this be considered for v0.7.0?
This PR still needs thorough review before I'm confident in merging it. I'll probably tag Steve at some point.
The release after v0.7.0 is more realistic and reasonable. We can always initiate a new release for this feature whenever it is merged.
@tomvothecoder no problem. Thank you for consideration.
@tomvothecoder it looks like when custom season go beyond calendar year (Nov, Dec, Jan) there is error as follows.
import os
import xcdat as xc
input_data = os.path.join(
"/p/css03/esgf_publish/CMIP6/CMIP/AWI/AWI-CM-1-1-MR/historical/r1i1p1f1/Amon/psl/gn/v20181218/",
"psl_Amon_AWI-CM-1-1-MR_historical_r1i1p1f1_gn_201301-201312.nc")
ds = xc.open_mfdataset(input_data)
# Example of custom seasons in a three month format:
custom_seasons = [
['Dec', 'Jan'],
]
season_config = {'custom_seasons': custom_seasons, 'dec_mode': 'DJF', 'drop_incomplete_djf': True}
ds.temporal.group_average("psl", "season", season_config=season_config)
CPU times: user 448 ms, sys: 32.8 ms, total: 481 ms
Wall time: 471 ms
xarray.Dataset
Dimensions:
lat: 192bnds: 2lon: 384time: 4
Coordinates:
lat
(lat)
float64
-89.28 -88.36 ... 88.36 89.28
lon
(lon)
float64
0.0 0.9375 1.875 ... 358.1 359.1
time
(time)
object
2013-02-01 00:00:00 ... 2013-11-...
Data variables:
lat_bnds
(lat, bnds)
float64
dask.array<chunksize=(192, 2), meta=np.ndarray>
lon_bnds
(lon, bnds)
float64
dask.array<chunksize=(384, 2), meta=np.ndarray>
psl
(time, lat, lon)
float64
dask.array<chunksize=(1, 192, 384), meta=np.ndarray>
Indexes: (3)
Attributes: (44)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataarray.py:862, in DataArray._getitem_coord(self, key)
861 try:
--> 862 var = self._coords[key]
863 except KeyError:
KeyError: 'time.year'
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
<timed exec> in ?()
~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, data_var, freq, weighted, keep_weights, season_config)
400 }
401 """
402 self._set_data_var_attrs(data_var)
403
--> 404 return self._averager(
405 data_var,
406 "group_average",
407 freq,
~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, data_var, mode, freq, weighted, keep_weights, reference_period, season_config)
870
871 if self._mode == "average":
872 dv_avg = self._average(ds, data_var)
873 elif self._mode in ["group_average", "climatology", "departures"]:
--> 874 dv_avg = self._group_average(ds, data_var)
875
876 # The original time dimension is dropped from the dataset because
877 # it becomes obsolete after the data variable is averaged. When the
~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, ds, data_var)
1295 dv = _get_data_var(ds, data_var)
1296
1297 # Label the time coordinates for grouping weights and the data variable
1298 # values.
-> 1299 self._labeled_time = self._label_time_coords(dv[self.dim])
1300
1301 if self._weighted:
1302 time_bounds = ds.bounds.get_bounds("T", var_key=data_var)
~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, time_coords)
1470 >>> dtype='datetime64[ns]')
1471 >>> Coordinates:
1472 >>> * time (time) datetime64[ns] 2000-01-01T00:00:00 ... 2000-04-01T00:00:00
1473 """
-> 1474 df_dt_components: pd.DataFrame = self._get_df_dt_components(
1475 time_coords, drop_obsolete_cols=True
1476 )
1477 dt_objects = self._convert_df_to_dt(df_dt_components)
~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, time_coords, drop_obsolete_cols)
1534
1535 # Use the TIME_GROUPS dictionary to determine which components
1536 # are needed to form the labeled time coordinates.
1537 for component in TIME_GROUPS[self._mode][self._freq]:
-> 1538 df[component] = time_coords[f"{self.dim}.{component}"].values
1539
1540 # The season frequency requires additional datetime components for
1541 # processing, which are later removed before time coordinates are
~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataarray.py in ?(self, key)
869 def __getitem__(self, key: Any) -> Self:
870 if isinstance(key, str):
--> 871 return self._getitem_coord(key)
872 else:
873 # xarray-style array indexing
874 return self.isel(indexers=self._item_key_to_dict(key))
~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataarray.py in ?(self, key)
861 try:
862 var = self._coords[key]
863 except KeyError:
864 dim_sizes = dict(zip(self.dims, self.shape))
--> 865 _, key, var = _get_virtual_variable(self._coords, key, dim_sizes)
866
867 return self._replace_maybe_drop_dims(var, name=key)
~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataset.py in ?(variables, key, dim_sizes)
212 if _contains_datetime_like_objects(ref_var):
213 ref_var = DataArray(ref_var)
214 data = getattr(ref_var.dt, var_name).data
215 else:
--> 216 data = getattr(ref_var, var_name).data
217 virtual_var = Variable(ref_var.dims, data)
218
219 return ref_name, var_name, virtual_var
AttributeError: 'IndexVariable' object has no attribute 'year'
CPU times: user 240 ms, sys: 13.2 ms, total: 253 ms
Wall time: 249 ms
xarray.Dataset
Dimensions:
lat: 192bnds: 2lon: 384time: 1
Coordinates:
lat
(lat)
float64
-89.28 -88.36 ... 88.36 89.28
lon
(lon)
float64
0.0 0.9375 1.875 ... 358.1 359.1
time
(time)
object
2013-02-01 00:00:00
Data variables:
lat_bnds
(lat, bnds)
float64
dask.array<chunksize=(192, 2), meta=np.ndarray>
lon_bnds
(lon, bnds)
float64
dask.array<chunksize=(384, 2), meta=np.ndarray>
psl
(time, lat, lon)
float64
dask.array<chunksize=(1, 192, 384), meta=np.ndarray>
Indexes: (3)
Attributes: (44)
CPU times: user 106 ms, sys: 6.96 ms, total: 113 ms
Wall time: 110 ms
xarray.Dataset
Dimensions:
lat: 192bnds: 2lon: 384time: 1
Coordinates:
lat
(lat)
float64
-89.28 -88.36 ... 88.36 89.28
lon
(lon)
float64
0.0 0.9375 1.875 ... 358.1 359.1
time
(time)
object
2013-02-01 00:00:00
Data variables:
lat_bnds
(lat, bnds)
float64
dask.array<chunksize=(192, 2), meta=np.ndarray>
lon_bnds
(lon, bnds)
float64
dask.array<chunksize=(384, 2), meta=np.ndarray>
psl
(time, lat, lon)
float64
dask.array<chunksize=(1, 192, 384), meta=np.ndarray>
Indexes: (3)
Attributes: (44)
@lee1043 Thanks for trying to this out and providing an example script! I'll debug the stack trace.
Hello,
All this features are very nice, and needed, improvements !
Wiil this be available soon ?
I've made some tests : it stiil doesn't work in xcdat 0.7.1.
Olivier
Hi @oliviermarti, thanks for your interest in this feature. I've had minimal time to work on it and plan to ramp up development soon. I don't have a set timeline on when it will be done, but I'm hoping within the next few months.
@tomvothecoder it looks like when custom season go beyond calendar year (Nov, Dec, Jan) there is error as follows.
import os import xcdat as xc input_data = os.path.join( "/p/css03/esgf_publish/CMIP6/CMIP/AWI/AWI-CM-1-1-MR/historical/r1i1p1f1/Amon/psl/gn/v20181218/", "psl_Amon_AWI-CM-1-1-MR_historical_r1i1p1f1_gn_201301-201312.nc") ds = xc.open_mfdataset(input_data) # Example of custom seasons in a three month format: custom_seasons = [ ['Dec', 'Jan'], ] season_config = {'custom_seasons': custom_seasons, 'dec_mode': 'DJF', 'drop_incomplete_djf': True} ds.temporal.group_average("psl", "season", season_config=season_config)
CPU times: user 448 ms, sys: 32.8 ms, total: 481 ms Wall time: 471 ms xarray.Dataset Dimensions: lat: 192bnds: 2lon: 384time: 4 Coordinates: lat (lat) float64 -89.28 -88.36 ... 88.36 89.28 lon (lon) float64 0.0 0.9375 1.875 ... 358.1 359.1 time (time) object 2013-02-01 00:00:00 ... 2013-11-... Data variables: lat_bnds (lat, bnds) float64 dask.array<chunksize=(192, 2), meta=np.ndarray> lon_bnds (lon, bnds) float64 dask.array<chunksize=(384, 2), meta=np.ndarray> psl (time, lat, lon) float64 dask.array<chunksize=(1, 192, 384), meta=np.ndarray> Indexes: (3) Attributes: (44) --------------------------------------------------------------------------- KeyError Traceback (most recent call last) File ~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataarray.py:862, in DataArray._getitem_coord(self, key) 861 try: --> 862 var = self._coords[key] 863 except KeyError: KeyError: 'time.year' During handling of the above exception, another exception occurred: AttributeError Traceback (most recent call last) <timed exec> in ?() ~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, data_var, freq, weighted, keep_weights, season_config) 400 } 401 """ 402 self._set_data_var_attrs(data_var) 403 --> 404 return self._averager( 405 data_var, 406 "group_average", 407 freq, ~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, data_var, mode, freq, weighted, keep_weights, reference_period, season_config) 870 871 if self._mode == "average": 872 dv_avg = self._average(ds, data_var) 873 elif self._mode in ["group_average", "climatology", "departures"]: --> 874 dv_avg = self._group_average(ds, data_var) 875 876 # The original time dimension is dropped from the dataset because 877 # it becomes obsolete after the data variable is averaged. When the ~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, ds, data_var) 1295 dv = _get_data_var(ds, data_var) 1296 1297 # Label the time coordinates for grouping weights and the data variable 1298 # values. -> 1299 self._labeled_time = self._label_time_coords(dv[self.dim]) 1300 1301 if self._weighted: 1302 time_bounds = ds.bounds.get_bounds("T", var_key=data_var) ~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, time_coords) 1470 >>> dtype='datetime64[ns]') 1471 >>> Coordinates: 1472 >>> * time (time) datetime64[ns] 2000-01-01T00:00:00 ... 2000-04-01T00:00:00 1473 """ -> 1474 df_dt_components: pd.DataFrame = self._get_df_dt_components( 1475 time_coords, drop_obsolete_cols=True 1476 ) 1477 dt_objects = self._convert_df_to_dt(df_dt_components) ~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, time_coords, drop_obsolete_cols) 1534 1535 # Use the TIME_GROUPS dictionary to determine which components 1536 # are needed to form the labeled time coordinates. 1537 for component in TIME_GROUPS[self._mode][self._freq]: -> 1538 df[component] = time_coords[f"{self.dim}.{component}"].values 1539 1540 # The season frequency requires additional datetime components for 1541 # processing, which are later removed before time coordinates are ~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataarray.py in ?(self, key) 869 def __getitem__(self, key: Any) -> Self: 870 if isinstance(key, str): --> 871 return self._getitem_coord(key) 872 else: 873 # xarray-style array indexing 874 return self.isel(indexers=self._item_key_to_dict(key)) ~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataarray.py in ?(self, key) 861 try: 862 var = self._coords[key] 863 except KeyError: 864 dim_sizes = dict(zip(self.dims, self.shape)) --> 865 _, key, var = _get_virtual_variable(self._coords, key, dim_sizes) 866 867 return self._replace_maybe_drop_dims(var, name=key) ~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataset.py in ?(variables, key, dim_sizes) 212 if _contains_datetime_like_objects(ref_var): 213 ref_var = DataArray(ref_var) 214 data = getattr(ref_var.dt, var_name).data 215 else: --> 216 data = getattr(ref_var, var_name).data 217 virtual_var = Variable(ref_var.dims, data) 218 219 return ref_name, var_name, virtual_var AttributeError: 'IndexVariable' object has no attribute 'year' CPU times: user 240 ms, sys: 13.2 ms, total: 253 ms Wall time: 249 ms xarray.Dataset Dimensions: lat: 192bnds: 2lon: 384time: 1 Coordinates: lat (lat) float64 -89.28 -88.36 ... 88.36 89.28 lon (lon) float64 0.0 0.9375 1.875 ... 358.1 359.1 time (time) object 2013-02-01 00:00:00 Data variables: lat_bnds (lat, bnds) float64 dask.array<chunksize=(192, 2), meta=np.ndarray> lon_bnds (lon, bnds) float64 dask.array<chunksize=(384, 2), meta=np.ndarray> psl (time, lat, lon) float64 dask.array<chunksize=(1, 192, 384), meta=np.ndarray> Indexes: (3) Attributes: (44) CPU times: user 106 ms, sys: 6.96 ms, total: 113 ms Wall time: 110 ms xarray.Dataset Dimensions: lat: 192bnds: 2lon: 384time: 1 Coordinates: lat (lat) float64 -89.28 -88.36 ... 88.36 89.28 lon (lon) float64 0.0 0.9375 1.875 ... 358.1 359.1 time (time) object 2013-02-01 00:00:00 Data variables: lat_bnds (lat, bnds) float64 dask.array<chunksize=(192, 2), meta=np.ndarray> lon_bnds (lon, bnds) float64 dask.array<chunksize=(384, 2), meta=np.ndarray> psl (time, lat, lon) float64 dask.array<chunksize=(1, 192, 384), meta=np.ndarray> Indexes: (3) Attributes: (44)
The issue is that this dataset only has a single year of data and does not have a complete ["Dec", "Jan"]
season with drop_incomplete_djf=True
.
<xarray.DataArray 'time' (time: 12)> Size: 96B
array([cftime.DatetimeProlepticGregorian(2013, 1, 16, 12, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(2013, 2, 15, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(2013, 3, 16, 12, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(2013, 4, 16, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(2013, 5, 16, 12, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(2013, 6, 16, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(2013, 7, 16, 12, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(2013, 8, 16, 12, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(2013, 9, 16, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(2013, 10, 16, 12, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(2013, 11, 16, 0, 0, 0, 0, has_year_zero=True),
cftime.DatetimeProlepticGregorian(2013, 12, 16, 12, 0, 0, 0, has_year_zero=True)],
dtype=object)
Coordinates:
* time (time) object 96B 2013-01-16 12:00:00 ... 2013-12-16 12:00:00
Attributes:
bounds: time_bnds
axis: T
long_name: time
standard_name: time
Resulting time coordinates after subsetting for Dec and Jan coordinates and options below:
-
custom_seasons=["Dec", "Jan"]
is a season that spans the calendar year. Months that span the calendar year will shift from the previous year to the next year for proper seasonal grouping. -
drop_incomplete_djf=True
(nowdrop_incomplete_seasons=True
) drops incomplete seasons (not all monthly time coordinates found to make up a season)
year season month expected_months actual_months
0 2013 DecJan 1 2 1
1 2014 DecJan 12 2 1
Since expected and actual months don't align, the coordinates are dropped. This results in no time coordinates remaining leading to AttributeError: 'IndexVariable' object has no attribute 'year'
Workaround -- specify drop_incomplete_seasons=False
I also added a clear error message when no complete seasons are found (commit).
After thorough testing, I actually think this PR is close to complete.
Hey @oliviermarti, @arfriedman, and @DamienIrving, would any of you be interested in checking out this branch to test the custom season functionality? If so, please check out our contributing guide. Thanks!
@tomvothecoder : this version seems perfect to me :-)
(I've got the branch feature/416-custom-season-span : correct ? )
Thanks !!
I join my very simple test case, which runs well :
import os, sys
import numpy as np, xarray as xr, xcdat as xc
# Version check
print ( f'{sys.version = }' )
print ( f'{np.__version__ = }' )
print ( f'{xr.__version__ = }' )
print ( f'{xc.__version__ = }' )
print ( f'{xc.__file__ = }' )
# Creates time axis
nt=36
time = cftime.num2date ( np.arange(36,dtype=float)*30.0+15, units="day since 2000-01-01", calendar='360_day', only_use_cftime_datetimes=True,
only_use_python_datetimes=False, has_year_zero=None)
time = xr.DataArray (time, dims=('time',), coords=(time,) )
# Creates two tests variables
nmonth = np.arange (nt,dtype=float) + 1
nmod = np.mod (np.arange (nt,dtype=float), 12) +1
nmonth = xr.DataArray (nmonth, dims=('time',), coords=(time,) )
nmod = xr.DataArray (nmod , dims=('time',), coords=(time,) )
# Creates dataset
dd = xr.Dataset ( {'nmonth': nmonth, 'nmod':nmod})
# Writes dataset and open it with xcdat
if os.path.exists ('test.nc') : os.remove ('test.nc')
dd.to_netcdf ( 'test.nc', mode='w')
dx = xc.open_dataset ('test.nc', use_cftime=True, decode_times=True).bounds.add_missing_bounds()
print ( 'Check variables')
print (dx.nmonth.values)
print (dx.nmod.values)
print ( 'Year means are correct')
print (dx.temporal.group_average("nmonth", freq="year", weighted=True)['nmonth'].values)
print (dx.temporal.group_average("nmod" , freq="year", weighted=True)['nmod' ].values)
print ( ' Standard seasonnal means are correct')
print (dx.temporal.group_average("nmonth", freq="season", weighted=True,
season_config={'drop_incomplete_djf':False, 'dec_mode': 'DJF',})['nmonth'].values)
print (dx.temporal.group_average("nmod" , freq="season", weighted=True,
season_config={'drop_incomplete_djf':False, 'dec_mode': 'DJF',})['nmod' ].values)
print (dx.temporal.group_average("nmonth", freq="season", weighted=True,
season_config={'drop_incomplete_djf':True, 'dec_mode': 'DJF',})['nmonth'].values)
print (dx.temporal.group_average("nmod" , freq="season", weighted=True,
season_config={'drop_incomplete_djf':True, 'dec_mode': 'DJF',})['nmod' ].values)
print ( '3 months custom seasons are correct' )
custom_seasons = [ ["Dec", "Jan", "Feb"], ]
print (dx.temporal.group_average("nmonth", freq="season",
season_config={'drop_incomplete_djf':False, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',},
weighted=True)['nmonth'].values)
print (dx.temporal.group_average("nmod" , freq="season",
season_config={'drop_incomplete_djf':False, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',},
weighted=True)['nmod' ].values)
print (dx.temporal.group_average("nmonth", freq="season",
season_config={'drop_incomplete_djf':True, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',},
weighted=True)['nmonth'].values)
print (dx.temporal.group_average("nmod" , freq="season",
season_config={'drop_incomplete_djf':True, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',},
weighted=True)['nmod' ].values)
print ( '4 months custom seasons are correct' )
custom_seasons = [ ["Dec", "Jan", "Feb", "Mar"], ]
print (dx.temporal.group_average("nmonth", freq="season",
season_config={'drop_incomplete_djf':False, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',},
weighted=True)['nmonth'].values)
print (dx.temporal.group_average("nmod" , freq="season",
season_config={'drop_incomplete_djf':False, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',},
weighted=True)['nmod' ].values)
print (dx.temporal.group_average("nmonth", freq="season",
season_config={'drop_incomplete_djf':True, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',},
weighted=True)['nmonth'].values)
print (dx.temporal.group_average("nmod" , freq="season",
season_config={'drop_incomplete_djf':True, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',},
weighted=True)['nmod' ].values)
Results :
sys.version = '3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:45:13) [Clang 16.0.6 ]'
np.__version__ = '1.26.4'
xr.__version__ = '2024.7.0'
xc.__version__ = '0.7.1'
xc.__file__ = '/Users/marti/Unix/XCDAT/xcdat/__init__.py'
Check variables
[ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.
19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36.]
[ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 1. 2. 3. 4. 5. 6.
7. 8. 9. 10. 11. 12. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.]
Year means are correct
[ 6.5 18.5 30.5]
[6.5 6.5 6.5]
Standard seasonnal means are correct
[ 1.5 4. 7. 10. 13. 16. 19. 22. 25. 28. 31. 34. 36. ]
[ 1.5 4. 7. 10. 5. 4. 7. 10. 5. 4. 7. 10. 12. ]
[ 4. 7. 10. 13. 16. 19. 22. 25. 28. 31. 34.]
[ 4. 7. 10. 5. 4. 7. 10. 5. 4. 7. 10.]
3 months custom seasons are correct
[ 1.5 13. 25. 36. ]
[ 1.5 5. 5. 12. ]
[13. 25.]
[5. 5.]
4 months custom seasons are correct
[ 2. 13.5 25.5 36. ]
[ 2. 4.5 4.5 12. ]
[13.5 25.5]
[4.5 4.5]
@tomvothecoder : this version seems perfect to me :-)
(I've got the branch feature/416-custom-season-span : correct ? )
Thanks !!
I join my very simple test case, which runs well :
Yes that is the correct branch. I'm happy to see your test cases pass! If you're interested, it'd be helpful to test on some real data too.
@tomvothecoder : this version seems perfect to me :-) (I've got the branch feature/416-custom-season-span : correct ? ) Thanks !! I join my very simple test case, which runs well :
Yes that is the correct branch. I'm happy to see your test cases pass! If you're interested, it'd be helpful to test on some real data too.
Tom,
On real data, version 0.7.1 returns a result, but a wrong result. It's hards to have an easy check on real data. Hence this academic test case.
I've made a fews test on real data, from ESGF, and raw data from my IPSL model. It seems OK.
I've got problems with the IPSL model, but it's an another story outside the present scope : IPSL raw outputs have one time axis 'time_counter', but two time variables : time_counter
and time_centered
. add_bounds
works nicely on both, but in the end I got :
File "/Users/marti/Unix/XCDAT/xcdat/temporal.py", line 1163, in _subset_coords_for_custom_seasons coords_by_month = ds.time.groupby(f"{self.dim}.month").groups
I have to rename time_counter
to time
variable to make it work.
IPSL outputs processed and available on ESGF are OK.
On real data, version 0.7.1 returns a result, but a wrong result. It's hards to have an easy check on real data. Hence this academic test case.
I've made a fews test on real data, from ESGF, and raw data from my IPSL model. It seems OK.
Thank you. @lee1043 is also going to test on CMIP data.
I've got problems with the IPSL model, but it's an another story outside the present scope : IPSL raw outputs have one time axis 'time_counter', but two time variables :
time_counter
andtime_centered
.add_bounds
works nicely on both, but in the end I got :File "/Users/marti/Unix/XCDAT/xcdat/temporal.py", line 1163, in _subset_coords_for_custom_seasons coords_by_month = ds.time.groupby(f"{self.dim}.month").groups
I have to renametime_counter
totime
variable to make it work.IPSL outputs processed and available on ESGF are OK.
I pushed 388f4da
(#423) to fix #684. Your example code in that issue now works.
Hey @oliviermarti, @arfriedman, and @DamienIrving, would any of you be interested in checking out this branch to test the custom season functionality? If so, please check out our contributing guide. Thanks!
Thank you @tomvothecoder! -- I will take a look.
Here's an upstream version: https://github.com/pydata/xarray/pull/9524 . I could use some help testing it out.