xcdat Add support for custom seasons spanning calendar years

Description

Closes #416
Closes #684

TODO:

[x] Support custom seasons that span calendar years (_shift_spanning_months())
- Requires detecting order of the months in a season. Currently, order does not matter.
- For example, for custom_season = ["Nov", "Dec", "Jan", "Feb", "Mar"]:
  - ["Nov", "Dec"] are from the previous year since they are listed before "Jan"
  - ["Jan", "Feb", "Mar"] are from the current year
- We can potentially extend _shift_decembers() to shift other months too. This method shifts the previous year December to the current year so xarray can properly group "DJF" seasons spanning calendar years.
[x] Detect and drop incomplete seasons (_drop_incomplete_seasons())
- Right now xCDAT only detects incomplete "DJF" seasons with _drop_incomplete_djf()
- Replace boolean config drop_incomplete_djf with drop_incomplete_season
- A possible solution for detecting incomplete seasons is to check if a season has all of the required months. If the count of months for that season does not match the expected count, then drop that season.
[x] Remove requirement for all 12 months to be included in a custom season

Checklist

[ ] My code follows the style guidelines of this project
[ ] I have performed a self-review of my own code
[ ] My changes generate no new warnings
[ ] Any dependent changes have been merged and published in downstream modules

If applicable:

[ ] I have added tests that prove my fix is effective or that my feature works
[ ] New and existing unit tests pass with my changes (locally and CI/CD build)
[ ] I have commented my code, particularly in hard-to-understand areas
[ ] I have made corresponding changes to the documentation
[ ] I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

Mar 03 '23 00:03 tomvothecoder

Example result of _drop_incomplete_seasons():


# Before dropping
# -----------------
# 2000-1, 2000-2, and 2001-12 months in incomplete "DJF" seasons" so they are dropped
ds.time
<xarray.DataArray 'time' (time: 15)>
array([cftime.DatetimeGregorian(2000, 1, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 2, 15, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 3, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 4, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 5, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 6, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 7, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 8, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 9, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 10, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 11, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 12, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2001, 1, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2001, 2, 15, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2001, 12, 16, 12, 0, 0, 0, has_year_zero=False)],
      dtype=object)
Coordinates:
  * time     (time) object 2000-01-16 12:00:00 ... 2001-12-16 12:00:00
Attributes:
    axis:           T
    long_name:      time
    standard_name:  time
    bounds:         time_bnds

# After dropping
# -----------------
ds_new.time
<xarray.DataArray 'time' (time: 12)>
array([cftime.DatetimeGregorian(2000, 3, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 4, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 5, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 6, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 7, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 8, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 9, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 10, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 11, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 12, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2001, 1, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2001, 2, 15, 0, 0, 0, 0, has_year_zero=False)],
      dtype=object)
Coordinates:
  * time     (time) object 2000-03-16 12:00:00 ... 2001-02-15 00:00:00
Attributes:
    axis:           T
    long_name:      time
    standard_name:  time
    bounds:         time_bnds

Mar 06 '23 20:03 tomvothecoder

Hey @lee1043, this PR seemed to be mostly done when I stopped working on it last year. I just had to fix a few things and update the tests.

Would you like to check out this branch to test it out on real data? Also a code review would be appreciated.

Apr 04 '24 21:04 tomvothecoder

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 100.00%. Comparing base (e3dda64) to head (1a91228).

Additional details and impacted files

@@            Coverage Diff            @@
##              main      #423   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           15        15           
  Lines         1546      1579   +33     
=========================================
+ Hits          1546      1579   +33

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Apr 04 '24 21:04 codecov[bot]

@tomvothecoder sure, I will test it out and review. Thank you for the update!

Apr 04 '24 21:04 lee1043

@tomvothecoder Can this be considered for v0.7.0?

Apr 04 '24 21:04 lee1043

@tomvothecoder Can this be considered for v0.7.0?

This PR still needs thorough review before I'm confident in merging it. I'll probably tag Steve at some point.

The release after v0.7.0 is more realistic and reasonable. We can always initiate a new release for this feature whenever it is merged.

Apr 04 '24 22:04 tomvothecoder

@tomvothecoder Can this be considered for v0.7.0?

This PR still needs thorough review before I'm confident in merging it. I'll probably tag Steve at some point.

The release after v0.7.0 is more realistic and reasonable. We can always initiate a new release for this feature whenever it is merged.

@tomvothecoder no problem. Thank you for consideration.

Apr 04 '24 22:04 lee1043

@tomvothecoder it looks like when custom season go beyond calendar year (Nov, Dec, Jan) there is error as follows.

import os
import xcdat as xc

input_data = os.path.join(
    "/p/css03/esgf_publish/CMIP6/CMIP/AWI/AWI-CM-1-1-MR/historical/r1i1p1f1/Amon/psl/gn/v20181218/",
    "psl_Amon_AWI-CM-1-1-MR_historical_r1i1p1f1_gn_201301-201312.nc")

ds = xc.open_mfdataset(input_data)

# Example of custom seasons in a three month format:
custom_seasons = [
    ['Dec', 'Jan'],
]

season_config = {'custom_seasons': custom_seasons, 'dec_mode': 'DJF', 'drop_incomplete_djf': True}

ds.temporal.group_average("psl", "season", season_config=season_config)

CPU times: user 448 ms, sys: 32.8 ms, total: 481 ms
Wall time: 471 ms
xarray.Dataset
Dimensions:
lat: 192bnds: 2lon: 384time: 4
Coordinates:
lat
(lat)
float64
-89.28 -88.36 ... 88.36 89.28
lon
(lon)
float64
0.0 0.9375 1.875 ... 358.1 359.1
time
(time)
object
2013-02-01 00:00:00 ... 2013-11-...
Data variables:
lat_bnds
(lat, bnds)
float64
dask.array<chunksize=(192, 2), meta=np.ndarray>
lon_bnds
(lon, bnds)
float64
dask.array<chunksize=(384, 2), meta=np.ndarray>
psl
(time, lat, lon)
float64
dask.array<chunksize=(1, 192, 384), meta=np.ndarray>
Indexes: (3)
Attributes: (44)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataarray.py:862, in DataArray._getitem_coord(self, key)
    861 try:
--> 862     var = self._coords[key]
    863 except KeyError:

KeyError: 'time.year'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<timed exec> in ?()

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, data_var, freq, weighted, keep_weights, season_config)
    400         }
    401         """
    402         self._set_data_var_attrs(data_var)
    403 
--> 404         return self._averager(
    405             data_var,
    406             "group_average",
    407             freq,

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, data_var, mode, freq, weighted, keep_weights, reference_period, season_config)
    870 
    871         if self._mode == "average":
    872             dv_avg = self._average(ds, data_var)
    873         elif self._mode in ["group_average", "climatology", "departures"]:
--> 874             dv_avg = self._group_average(ds, data_var)
    875 
    876         # The original time dimension is dropped from the dataset because
    877         # it becomes obsolete after the data variable is averaged. When the

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, ds, data_var)
   1295         dv = _get_data_var(ds, data_var)
   1296 
   1297         # Label the time coordinates for grouping weights and the data variable
   1298         # values.
-> 1299         self._labeled_time = self._label_time_coords(dv[self.dim])
   1300 
   1301         if self._weighted:
   1302             time_bounds = ds.bounds.get_bounds("T", var_key=data_var)

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, time_coords)
   1470         >>>       dtype='datetime64[ns]')
   1471         >>> Coordinates:
   1472         >>> * time     (time) datetime64[ns] 2000-01-01T00:00:00 ... 2000-04-01T00:00:00
   1473         """
-> 1474         df_dt_components: pd.DataFrame = self._get_df_dt_components(
   1475             time_coords, drop_obsolete_cols=True
   1476         )
   1477         dt_objects = self._convert_df_to_dt(df_dt_components)

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, time_coords, drop_obsolete_cols)
   1534 
   1535         # Use the TIME_GROUPS dictionary to determine which components
   1536         # are needed to form the labeled time coordinates.
   1537         for component in TIME_GROUPS[self._mode][self._freq]:
-> 1538             df[component] = time_coords[f"{self.dim}.{component}"].values
   1539 
   1540         # The season frequency requires additional datetime components for
   1541         # processing, which are later removed before time coordinates are

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataarray.py in ?(self, key)
    869     def __getitem__(self, key: Any) -> Self:
    870         if isinstance(key, str):
--> 871             return self._getitem_coord(key)
    872         else:
    873             # xarray-style array indexing
    874             return self.isel(indexers=self._item_key_to_dict(key))

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataarray.py in ?(self, key)
    861         try:
    862             var = self._coords[key]
    863         except KeyError:
    864             dim_sizes = dict(zip(self.dims, self.shape))
--> 865             _, key, var = _get_virtual_variable(self._coords, key, dim_sizes)
    866 
    867         return self._replace_maybe_drop_dims(var, name=key)

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataset.py in ?(variables, key, dim_sizes)
    212     if _contains_datetime_like_objects(ref_var):
    213         ref_var = DataArray(ref_var)
    214         data = getattr(ref_var.dt, var_name).data
    215     else:
--> 216         data = getattr(ref_var, var_name).data
    217     virtual_var = Variable(ref_var.dims, data)
    218 
    219     return ref_name, var_name, virtual_var

AttributeError: 'IndexVariable' object has no attribute 'year'
CPU times: user 240 ms, sys: 13.2 ms, total: 253 ms
Wall time: 249 ms
xarray.Dataset
Dimensions:
lat: 192bnds: 2lon: 384time: 1
Coordinates:
lat
(lat)
float64
-89.28 -88.36 ... 88.36 89.28
lon
(lon)
float64
0.0 0.9375 1.875 ... 358.1 359.1
time
(time)
object
2013-02-01 00:00:00
Data variables:
lat_bnds
(lat, bnds)
float64
dask.array<chunksize=(192, 2), meta=np.ndarray>
lon_bnds
(lon, bnds)
float64
dask.array<chunksize=(384, 2), meta=np.ndarray>
psl
(time, lat, lon)
float64
dask.array<chunksize=(1, 192, 384), meta=np.ndarray>
Indexes: (3)
Attributes: (44)
CPU times: user 106 ms, sys: 6.96 ms, total: 113 ms
Wall time: 110 ms
xarray.Dataset
Dimensions:
lat: 192bnds: 2lon: 384time: 1
Coordinates:
lat
(lat)
float64
-89.28 -88.36 ... 88.36 89.28
lon
(lon)
float64
0.0 0.9375 1.875 ... 358.1 359.1
time
(time)
object
2013-02-01 00:00:00
Data variables:
lat_bnds
(lat, bnds)
float64
dask.array<chunksize=(192, 2), meta=np.ndarray>
lon_bnds
(lon, bnds)
float64
dask.array<chunksize=(384, 2), meta=np.ndarray>
psl
(time, lat, lon)
float64
dask.array<chunksize=(1, 192, 384), meta=np.ndarray>
Indexes: (3)
Attributes: (44)

Apr 04 '24 23:04 lee1043

@lee1043 Thanks for trying to this out and providing an example script! I'll debug the stack trace.

Apr 08 '24 16:04 tomvothecoder

Hello,

All this features are very nice, and needed, improvements !

Wiil this be available soon ?

I've made some tests : it stiil doesn't work in xcdat 0.7.1.

Olivier

Jul 29 '24 14:07 oliviermarti

Hi @oliviermarti, thanks for your interest in this feature. I've had minimal time to work on it and plan to ramp up development soon. I don't have a set timeline on when it will be done, but I'm hoping within the next few months.

Jul 30 '24 18:07 tomvothecoder

@tomvothecoder it looks like when custom season go beyond calendar year (Nov, Dec, Jan) there is error as follows.

import os
import xcdat as xc

input_data = os.path.join(
    "/p/css03/esgf_publish/CMIP6/CMIP/AWI/AWI-CM-1-1-MR/historical/r1i1p1f1/Amon/psl/gn/v20181218/",
    "psl_Amon_AWI-CM-1-1-MR_historical_r1i1p1f1_gn_201301-201312.nc")

ds = xc.open_mfdataset(input_data)

# Example of custom seasons in a three month format:
custom_seasons = [
    ['Dec', 'Jan'],
]

season_config = {'custom_seasons': custom_seasons, 'dec_mode': 'DJF', 'drop_incomplete_djf': True}

ds.temporal.group_average("psl", "season", season_config=season_config)

CPU times: user 448 ms, sys: 32.8 ms, total: 481 ms
Wall time: 471 ms
xarray.Dataset
Dimensions:
lat: 192bnds: 2lon: 384time: 4
Coordinates:
lat
(lat)
float64
-89.28 -88.36 ... 88.36 89.28
lon
(lon)
float64
0.0 0.9375 1.875 ... 358.1 359.1
time
(time)
object
2013-02-01 00:00:00 ... 2013-11-...
Data variables:
lat_bnds
(lat, bnds)
float64
dask.array<chunksize=(192, 2), meta=np.ndarray>
lon_bnds
(lon, bnds)
float64
dask.array<chunksize=(384, 2), meta=np.ndarray>
psl
(time, lat, lon)
float64
dask.array<chunksize=(1, 192, 384), meta=np.ndarray>
Indexes: (3)
Attributes: (44)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataarray.py:862, in DataArray._getitem_coord(self, key)
    861 try:
--> 862     var = self._coords[key]
    863 except KeyError:

KeyError: 'time.year'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<timed exec> in ?()

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, data_var, freq, weighted, keep_weights, season_config)
    400         }
    401         """
    402         self._set_data_var_attrs(data_var)
    403 
--> 404         return self._averager(
    405             data_var,
    406             "group_average",
    407             freq,

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, data_var, mode, freq, weighted, keep_weights, reference_period, season_config)
    870 
    871         if self._mode == "average":
    872             dv_avg = self._average(ds, data_var)
    873         elif self._mode in ["group_average", "climatology", "departures"]:
--> 874             dv_avg = self._group_average(ds, data_var)
    875 
    876         # The original time dimension is dropped from the dataset because
    877         # it becomes obsolete after the data variable is averaged. When the

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, ds, data_var)
   1295         dv = _get_data_var(ds, data_var)
   1296 
   1297         # Label the time coordinates for grouping weights and the data variable
   1298         # values.
-> 1299         self._labeled_time = self._label_time_coords(dv[self.dim])
   1300 
   1301         if self._weighted:
   1302             time_bounds = ds.bounds.get_bounds("T", var_key=data_var)

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, time_coords)
   1470         >>>       dtype='datetime64[ns]')
   1471         >>> Coordinates:
   1472         >>> * time     (time) datetime64[ns] 2000-01-01T00:00:00 ... 2000-04-01T00:00:00
   1473         """
-> 1474         df_dt_components: pd.DataFrame = self._get_df_dt_components(
   1475             time_coords, drop_obsolete_cols=True
   1476         )
   1477         dt_objects = self._convert_df_to_dt(df_dt_components)

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, time_coords, drop_obsolete_cols)
   1534 
   1535         # Use the TIME_GROUPS dictionary to determine which components
   1536         # are needed to form the labeled time coordinates.
   1537         for component in TIME_GROUPS[self._mode][self._freq]:
-> 1538             df[component] = time_coords[f"{self.dim}.{component}"].values
   1539 
   1540         # The season frequency requires additional datetime components for
   1541         # processing, which are later removed before time coordinates are

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataarray.py in ?(self, key)
    869     def __getitem__(self, key: Any) -> Self:
    870         if isinstance(key, str):
--> 871             return self._getitem_coord(key)
    872         else:
    873             # xarray-style array indexing
    874             return self.isel(indexers=self._item_key_to_dict(key))

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataarray.py in ?(self, key)
    861         try:
    862             var = self._coords[key]
    863         except KeyError:
    864             dim_sizes = dict(zip(self.dims, self.shape))
--> 865             _, key, var = _get_virtual_variable(self._coords, key, dim_sizes)
    866 
    867         return self._replace_maybe_drop_dims(var, name=key)

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataset.py in ?(variables, key, dim_sizes)
    212     if _contains_datetime_like_objects(ref_var):
    213         ref_var = DataArray(ref_var)
    214         data = getattr(ref_var.dt, var_name).data
    215     else:
--> 216         data = getattr(ref_var, var_name).data
    217     virtual_var = Variable(ref_var.dims, data)
    218 
    219     return ref_name, var_name, virtual_var

AttributeError: 'IndexVariable' object has no attribute 'year'
CPU times: user 240 ms, sys: 13.2 ms, total: 253 ms
Wall time: 249 ms
xarray.Dataset
Dimensions:
lat: 192bnds: 2lon: 384time: 1
Coordinates:
lat
(lat)
float64
-89.28 -88.36 ... 88.36 89.28
lon
(lon)
float64
0.0 0.9375 1.875 ... 358.1 359.1
time
(time)
object
2013-02-01 00:00:00
Data variables:
lat_bnds
(lat, bnds)
float64
dask.array<chunksize=(192, 2), meta=np.ndarray>
lon_bnds
(lon, bnds)
float64
dask.array<chunksize=(384, 2), meta=np.ndarray>
psl
(time, lat, lon)
float64
dask.array<chunksize=(1, 192, 384), meta=np.ndarray>
Indexes: (3)
Attributes: (44)
CPU times: user 106 ms, sys: 6.96 ms, total: 113 ms
Wall time: 110 ms
xarray.Dataset
Dimensions:
lat: 192bnds: 2lon: 384time: 1
Coordinates:
lat
(lat)
float64
-89.28 -88.36 ... 88.36 89.28
lon
(lon)
float64
0.0 0.9375 1.875 ... 358.1 359.1
time
(time)
object
2013-02-01 00:00:00
Data variables:
lat_bnds
(lat, bnds)
float64
dask.array<chunksize=(192, 2), meta=np.ndarray>
lon_bnds
(lon, bnds)
float64
dask.array<chunksize=(384, 2), meta=np.ndarray>
psl
(time, lat, lon)
float64
dask.array<chunksize=(1, 192, 384), meta=np.ndarray>
Indexes: (3)
Attributes: (44)

The issue is that this dataset only has a single year of data and does not have a complete `["Dec", "Jan"]` season with `drop_incomplete_djf=True`.

<xarray.DataArray 'time' (time: 12)> Size: 96B
array([cftime.DatetimeProlepticGregorian(2013, 1, 16, 12, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2013, 2, 15, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2013, 3, 16, 12, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2013, 4, 16, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2013, 5, 16, 12, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2013, 6, 16, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2013, 7, 16, 12, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2013, 8, 16, 12, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2013, 9, 16, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2013, 10, 16, 12, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2013, 11, 16, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2013, 12, 16, 12, 0, 0, 0, has_year_zero=True)],
      dtype=object)
Coordinates:
  * time     (time) object 96B 2013-01-16 12:00:00 ... 2013-12-16 12:00:00
Attributes:
    bounds:         time_bnds
    axis:           T
    long_name:      time
    standard_name:  time

Resulting time coordinates after subsetting for Dec and Jan coordinates and options below:

custom_seasons=["Dec", "Jan"] is a season that spans the calendar year. Months that span the calendar year will shift from the previous year to the next year for proper seasonal grouping.
drop_incomplete_djf=True (now drop_incomplete_seasons=True) drops incomplete seasons (not all monthly time coordinates found to make up a season)

   year  season  month  expected_months  actual_months
0  2013  DecJan      1                2              1
1  2014  DecJan     12                2              1

Since expected and actual months don't align, the coordinates are dropped. This results in no time coordinates remaining leading to AttributeError: 'IndexVariable' object has no attribute 'year'

Workaround -- specify drop_incomplete_seasons=False

I also added a clear error message when no complete seasons are found (commit).

Jul 30 '24 19:07 tomvothecoder

After thorough testing, I actually think this PR is close to complete.

Jul 30 '24 19:07 tomvothecoder

Hey @oliviermarti, @arfriedman, and @DamienIrving, would any of you be interested in checking out this branch to test the custom season functionality? If so, please check out our contributing guide. Thanks!

Jul 30 '24 19:07 tomvothecoder

@tomvothecoder : this version seems perfect to me :-)

(I've got the branch feature/416-custom-season-span : correct ? )

Thanks !!

I join my very simple test case, which runs well :

import os, sys
import numpy as np, xarray as xr, xcdat as xc

# Version check
print ( f'{sys.version = }' )

print ( f'{np.__version__ = }' )
print ( f'{xr.__version__ = }' )
print ( f'{xc.__version__ = }' )
print ( f'{xc.__file__ = }' )

# Creates time axis
nt=36

time = cftime.num2date ( np.arange(36,dtype=float)*30.0+15, units="day since 2000-01-01", calendar='360_day', only_use_cftime_datetimes=True,
                only_use_python_datetimes=False, has_year_zero=None)

time = xr.DataArray (time, dims=('time',), coords=(time,) )

# Creates two tests variables
nmonth = np.arange (nt,dtype=float) + 1
nmod   = np.mod (np.arange (nt,dtype=float), 12) +1

nmonth = xr.DataArray (nmonth, dims=('time',), coords=(time,) )
nmod   = xr.DataArray (nmod  , dims=('time',), coords=(time,) )

# Creates dataset
dd = xr.Dataset ( {'nmonth': nmonth, 'nmod':nmod})

# Writes dataset and open it with xcdat
if os.path.exists ('test.nc') : os.remove ('test.nc')
dd.to_netcdf ( 'test.nc', mode='w')
dx = xc.open_dataset ('test.nc',  use_cftime=True, decode_times=True).bounds.add_missing_bounds()

print ( 'Check variables')
print (dx.nmonth.values)
print (dx.nmod.values)

print ( 'Year means are correct')
print (dx.temporal.group_average("nmonth", freq="year", weighted=True)['nmonth'].values)
print (dx.temporal.group_average("nmod"  , freq="year", weighted=True)['nmod'  ].values)


print ( ' Standard seasonnal means are correct')
print (dx.temporal.group_average("nmonth", freq="season", weighted=True,
                                season_config={'drop_incomplete_djf':False, 'dec_mode': 'DJF',})['nmonth'].values)
print (dx.temporal.group_average("nmod"  , freq="season", weighted=True,
                                season_config={'drop_incomplete_djf':False, 'dec_mode': 'DJF',})['nmod'  ].values)

print (dx.temporal.group_average("nmonth", freq="season", weighted=True, 
                                 season_config={'drop_incomplete_djf':True, 'dec_mode': 'DJF',})['nmonth'].values)
print (dx.temporal.group_average("nmod"  , freq="season", weighted=True, 
                                 season_config={'drop_incomplete_djf':True, 'dec_mode': 'DJF',})['nmod'  ].values)


print ( '3 months custom seasons are correct' )

custom_seasons = [  ["Dec", "Jan", "Feb"], ]

print (dx.temporal.group_average("nmonth", freq="season", 
                                 season_config={'drop_incomplete_djf':False, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',}, 
                                 weighted=True)['nmonth'].values)
print (dx.temporal.group_average("nmod"  , freq="season", 
                                 season_config={'drop_incomplete_djf':False, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',}, 
                                 weighted=True)['nmod'  ].values)

print (dx.temporal.group_average("nmonth", freq="season", 
                                 season_config={'drop_incomplete_djf':True, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',}, 
                                 weighted=True)['nmonth'].values)
print (dx.temporal.group_average("nmod"  , freq="season", 
                                 season_config={'drop_incomplete_djf':True, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',}, 
                                 weighted=True)['nmod'  ].values)


print ( '4 months custom seasons are correct' )

custom_seasons = [  ["Dec", "Jan", "Feb", "Mar"], ]

print (dx.temporal.group_average("nmonth", freq="season", 
                                 season_config={'drop_incomplete_djf':False, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',}, 
                                 weighted=True)['nmonth'].values)
print (dx.temporal.group_average("nmod"  , freq="season", 
                                 season_config={'drop_incomplete_djf':False, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',}, 
                                 weighted=True)['nmod'  ].values)


print (dx.temporal.group_average("nmonth", freq="season", 
                                 season_config={'drop_incomplete_djf':True, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',}, 
                                 weighted=True)['nmonth'].values)
print (dx.temporal.group_average("nmod"  , freq="season", 
                                 season_config={'drop_incomplete_djf':True, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',}, 
                                 weighted=True)['nmod'  ].values)

Results :

sys.version = '3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:45:13) [Clang 16.0.6 ]'
np.__version__ = '1.26.4'
xr.__version__ = '2024.7.0'
xc.__version__ = '0.7.1'
xc.__file__ = '/Users/marti/Unix/XCDAT/xcdat/__init__.py'
Check variables
[ 1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17. 18.
 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36.]
[ 1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12.  1.  2.  3.  4.  5.  6.
  7.  8.  9. 10. 11. 12.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12.]
Year means are correct
[ 6.5 18.5 30.5]
[6.5 6.5 6.5]
 Standard seasonnal means are correct
[ 1.5  4.   7.  10.  13.  16.  19.  22.  25.  28.  31.  34.  36. ]
[ 1.5  4.   7.  10.   5.   4.   7.  10.   5.   4.   7.  10.  12. ]
[ 4.  7. 10. 13. 16. 19. 22. 25. 28. 31. 34.]
[ 4.  7. 10.  5.  4.  7. 10.  5.  4.  7. 10.]
3 months custom seasons are correct
[ 1.5 13.  25.  36. ]
[ 1.5  5.   5.  12. ]
[13. 25.]
[5. 5.]
4 months custom seasons are correct
[ 2.  13.5 25.5 36. ]
[ 2.   4.5  4.5 12. ]
[13.5 25.5]
[4.5 4.5]

Jul 31 '24 08:07 oliviermarti

@tomvothecoder : this version seems perfect to me :-)

(I've got the branch feature/416-custom-season-span : correct ? )

Thanks !!

I join my very simple test case, which runs well :

Yes that is the correct branch. I'm happy to see your test cases pass! If you're interested, it'd be helpful to test on some real data too.

Jul 31 '24 22:07 tomvothecoder

@tomvothecoder : this version seems perfect to me :-) (I've got the branch feature/416-custom-season-span : correct ? ) Thanks !! I join my very simple test case, which runs well :

Yes that is the correct branch. I'm happy to see your test cases pass! If you're interested, it'd be helpful to test on some real data too.

Tom,

On real data, version 0.7.1 returns a result, but a wrong result. It's hards to have an easy check on real data. Hence this academic test case.

I've made a fews test on real data, from ESGF, and raw data from my IPSL model. It seems OK.

I've got problems with the IPSL model, but it's an another story outside the present scope : IPSL raw outputs have one time axis 'time_counter', but two time variables : time_counter and time_centered. add_boundsworks nicely on both, but in the end I got : File "/Users/marti/Unix/XCDAT/xcdat/temporal.py", line 1163, in _subset_coords_for_custom_seasons coords_by_month = ds.time.groupby(f"{self.dim}.month").groups I have to rename time_counter to time variable to make it work.

IPSL outputs processed and available on ESGF are OK.

Aug 01 '24 08:08 oliviermarti

On real data, version 0.7.1 returns a result, but a wrong result. It's hards to have an easy check on real data. Hence this academic test case.

I've made a fews test on real data, from ESGF, and raw data from my IPSL model. It seems OK.

Thank you. @lee1043 is also going to test on CMIP data.

I've got problems with the IPSL model, but it's an another story outside the present scope : IPSL raw outputs have one time axis 'time_counter', but two time variables : time_counter and time_centered. add_boundsworks nicely on both, but in the end I got : File "/Users/marti/Unix/XCDAT/xcdat/temporal.py", line 1163, in _subset_coords_for_custom_seasons coords_by_month = ds.time.groupby(f"{self.dim}.month").groups I have to rename time_counter to time variable to make it work.

IPSL outputs processed and available on ESGF are OK.

I pushed 388f4da (#423) to fix #684. Your example code in that issue now works.

Aug 01 '24 17:08 tomvothecoder

Hey @oliviermarti, @arfriedman, and @DamienIrving, would any of you be interested in checking out this branch to test the custom season functionality? If so, please check out our contributing guide. Thanks!

Thank you @tomvothecoder! -- I will take a look.

Aug 12 '24 15:08 arfriedman

Here's an upstream version: https://github.com/pydata/xarray/pull/9524 . I could use some help testing it out.

Sep 20 '24 03:09 dcherian

xcdat xcdat copied to clipboard

Add support for custom seasons spanning calendar years

Description

TODO:

Checklist

Codecov Report

The issue is that this dataset only has a single year of data and does not have a complete ["Dec", "Jan"] season with drop_incomplete_djf=True.

Resulting time coordinates after subsetting for Dec and Jan coordinates and options below:

xcdat
xcdat copied to clipboard

The issue is that this dataset only has a single year of data and does not have a complete `["Dec", "Jan"]` season with `drop_incomplete_djf=True`.