xcdat icon indicating copy to clipboard operation
xcdat copied to clipboard

[Bug]: Issue with decoding dataset datetime that include NaN values (`cftime` and linux bug)

Open ezekiel-lemur opened this issue 2 months ago • 18 comments

What happened?

Issue with decoding datetime that include NaN values

What did you expect to happen? Are there are possible answers you came across?

No response

Minimal Complete Verifiable Example (MVCE)

path1 = '/nethome/0641332/Thesis/Data/Parcels/standard6Controlnew_run.nc'
ds = xr.open_dataset(path1, decode_times=False)
ds['time'].attrs['axis'] = 'time'
ds['time'].attrs['long_name'] = 'time'
ds = ds.set_coords(("time"))
ds_decoded = xcdat.decode_time(ds)


the ds file looks like:
<xarray.Dataset>
Dimensions:     (trajectory: 92678, obs: 920)
Coordinates:
  * obs         (obs) int32 0 1 2 3 4 5 6 7 ... 912 913 914 915 916 917 918 919
    time        (trajectory, obs) float64 3.478e+08 3.477e+08 ... nan nan
  * trajectory  (trajectory) int64 146163 146164 146165 ... 59605 60362 65431
Data variables:
    S           (trajectory, obs) float32 ...
    T           (trajectory, obs) float32 ...
    age         (trajectory, obs) float32 ...
    lat         (trajectory, obs) float32 ...
    lon         (trajectory, obs) float32 ...
    transport   (trajectory) float32 ...
    z           (trajectory, obs) float32 ...
Attributes:
    Conventions:            CF-1.6/CF-1.7
    feature_type:           trajectory
    ncei_template_version:  NCEI_NetCDF_Trajectory_Template_v2.0
    parcels_kernels:        SampleParticleAdvectionRK4_3DSample_T_SCheckPreBo...
    parcels_mesh:           spherical
    parcels_version:        3.0.2


and the time axis:
(trajectory, obs)
float64
3.478e+08 3.477e+08 ... nan nan
axis : time
calendar : noleap
long_name : time
standard_name :time
units : seconds since 0488-02-01 00:00:00

Relevant log output

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[54], line 1
----> 1 ds_decoded = xcdat.decode_time(ds)

File ~/miniconda3/envs/parcels/lib/python3.12/site-packages/xcdat/dataset.py:370, in decode_time(dataset)
    367 ds = ds.assign_coords({coords.name: decoded_time})
    369 try:
--> 370     bounds = ds.bounds.get_bounds("T", var_key=coords.name)
    371 except KeyError:
    372     bounds = None

File ~/miniconda3/envs/parcels/lib/python3.12/site-packages/xcdat/bounds.py:228, in BoundsAccessor.get_bounds(self, axis, var_key)
    199 def get_bounds(
    200     self, axis: CFAxisKey, var_key: Optional[str] = None
    201 ) -> Union[xr.Dataset, xr.DataArray]:
    202     """Gets coordinate bounds.
    203 
    204     Parameters
   (...)
    226         If bounds were not found for the specific ``axis``.
    227     """
--> 228     self._validate_axis_arg(axis)
    230     if var_key is None:
    231         # Get all bounds keys in the Dataset for this axis.
    232         bounds_keys = self._get_bounds_keys(axis)

File ~/miniconda3/envs/parcels/lib/python3.12/site-packages/xcdat/bounds.py:855, in BoundsAccessor._validate_axis_arg(self, axis)
    850     keys = ", ".join(f"'{key}'" for key in cf_axis_keys)
    851     raise ValueError(
    852         f"Incorrect 'axis' argument value. Supported values include {keys}."
    853     )
--> 855 get_dim_coords(self._dataset, axis)

File ~/miniconda3/envs/parcels/lib/python3.12/site-packages/xcdat/axis.py:132, in get_dim_coords(obj, axis)
    127 index_keys = obj.indexes.keys()
    129 # Attempt to map the axis it all of its coordinate variable(s) using the
    130 # axis and coordinate names in the object attributes (if they are set).
    131 # Example: Returns ["time", "time_centered"] with `axis="T"`
--> 132 coord_keys = _get_all_coord_keys(obj, axis)
    133 # Filter the index keys to just the dimension coordinate keys.
    134 # Example: Returns ["time"], since "time_centered" is not in `index_keys`
    135 dim_coord_keys = list(set(index_keys) & set(coord_keys))

File ~/miniconda3/envs/parcels/lib/python3.12/site-packages/xcdat/axis.py:317, in _get_all_coord_keys(obj, axis)
    314     pass
    316 try:
--> 317     keys = keys + obj.cf.coordinates[cf_attrs["coordinate"]]
    318 except KeyError:
    319     pass

File ~/miniconda3/envs/parcels/lib/python3.12/site-packages/cf_xarray/accessor.py:1659, in CFAccessor.coordinates(self)
   1642 @property
   1643 def coordinates(self) -> dict[str, list[Hashable]]:
   1644     """
   1645     Mapping valid Coordinate standard names for ``.cf[]`` to variable names.
   1646 
   (...)
   1657         Values are lists of variable names that match that particular key.
   1658     """
-> 1659     vardict = {key: _get_coords(self._obj, key) for key in _COORD_NAMES}
   1661     return {k: sort_maybe_hashable(v) for k, v in vardict.items() if v}

File ~/miniconda3/envs/parcels/lib/python3.12/site-packages/cf_xarray/accessor.py:505, in _get_coords(obj, key)
    500 def _get_coords(obj: DataArray | Dataset, key: Hashable) -> list[Hashable]:
    501     """
    502     One or more of ('X', 'Y', 'Z', 'T', 'longitude', 'latitude', 'vertical', 'time',
    503     'area', 'volume'), or arbitrary measures, or standard names present in .coords
    504     """
--> 505     return [k for k in _get_all(obj, key) if k in obj.coords or k in obj.dims]

File ~/miniconda3/envs/parcels/lib/python3.12/site-packages/cf_xarray/accessor.py:480, in _get_all(obj, key)
    467 """
    468 One or more of ('X', 'Y', 'Z', 'T', 'longitude', 'latitude', 'vertical', 'time',
    469 'area', 'volume'), or arbitrary measures, or standard names
    470 """
    471 all_mappers: tuple[Mapper] = (
    472     _get_custom_criteria,
    473     functools.partial(_get_custom_criteria, criteria=cf_role_criteria),  # type: ignore[assignment]
   (...)
    478     _get_with_standard_name,
    479 )
--> 480 results = apply_mapper(all_mappers, obj, key, error=False, default=None)
    481 return results

File ~/miniconda3/envs/parcels/lib/python3.12/site-packages/cf_xarray/accessor.py:137, in apply_mapper(mappers, obj, key, error, default)
    135 results = []
    136 for mapper in mappers:
--> 137     results.append(_apply_single_mapper(mapper))
    139 flat = list(itertools.chain(*results))
    140 # de-duplicate

File ~/miniconda3/envs/parcels/lib/python3.12/site-packages/cf_xarray/accessor.py:121, in apply_mapper.<locals>._apply_single_mapper(mapper)
    119 def _apply_single_mapper(mapper):
    120     try:
--> 121         results = mapper(obj, key)
    122     except (KeyError, ValueError) as e:
    123         if error or "I expected only one." in repr(e):

File ~/miniconda3/envs/parcels/lib/python3.12/site-packages/cf_xarray/accessor.py:320, in _get_axis_coord(obj, key)
    317     results.update((coord,))
    318 if criterion == "units":
    319     # deal with pint-backed objects
--> 320     units = getattr(var.data, "units", None)
    321     if units in expected:
    322         results.update((coord,))

File ~/miniconda3/envs/parcels/lib/python3.12/site-packages/xarray/core/dataarray.py:756, in DataArray.data(self)
    744 @property
    745 def data(self) -> Any:
    746     """
    747     The DataArray's data as an array. The underlying array type
    748     (e.g. dask, sparse, pint) is preserved.
   (...)
    754     DataArray.values
    755     """
--> 756     return self.variable.data

File ~/miniconda3/envs/parcels/lib/python3.12/site-packages/xarray/core/variable.py:416, in Variable.data(self)
    414     return self._data
    415 elif isinstance(self._data, indexing.ExplicitlyIndexed):
--> 416     return self._data.get_duck_array()
    417 else:
    418     return self.values

File ~/miniconda3/envs/parcels/lib/python3.12/site-packages/xarray/coding/variables.py:74, in _ElementwiseFunctionArray.get_duck_array(self)
     73 def get_duck_array(self):
---> 74     return self.func(self.array.get_duck_array())

File ~/miniconda3/envs/parcels/lib/python3.12/site-packages/xcdat/dataset.py:649, in _get_cftime_coords(offsets, units, calendar)
    646 units_type, ref_date = units.split(" since ")
    648 if units_type not in NON_CF_TIME_UNITS:
--> 649     return decode_cf_datetime(offsets, units, calendar=calendar, use_cftime=True)
    651 offsets = np.asarray(offsets)
    652 flat_offsets = offsets.ravel()

File ~/miniconda3/envs/parcels/lib/python3.12/site-packages/xarray/coding/times.py:342, in decode_cf_datetime(num_dates, units, calendar, use_cftime)
    340                 dates = cftime_to_nptime(dates)
    341 elif use_cftime:
--> 342     dates = _decode_datetime_with_cftime(flat_num_dates, units, calendar)
    343 else:
    344     dates = _decode_datetime_with_pandas(flat_num_dates, units, calendar)

File ~/miniconda3/envs/parcels/lib/python3.12/site-packages/xarray/coding/times.py:237, in _decode_datetime_with_cftime(num_dates, units, calendar)
    234     raise ModuleNotFoundError("No module named 'cftime'")
    235 if num_dates.size > 0:
    236     return np.asarray(
--> 237         cftime.num2date(num_dates, units, calendar, only_use_cftime_datetimes=True)
    238     )
    239 else:
    240     return np.array([], dtype=object)

File src/cftime/_cftime.pyx:630, in cftime._cftime.num2date()

File src/cftime/_cftime.pyx:499, in cftime._cftime.decode_dates_from_array()

TypeError: unsupported operand type(s) for +: 'cftime._cftime.DatetimeNoLeap' and 'NoneType'

Anything else we need to know?

No response

Environment

version('xcdat'): '0.7.0'

INSTALLED VERSIONS

commit: None python: 3.12.0 | packaged by conda-forge | (main, Oct 3 2023, 08:43:22) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 4.18.0-513.24.1.el8_9.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.2 libnetcdf: 4.9.2

xarray: 2023.11.0 pandas: 2.1.4 numpy: 1.26.3 scipy: 1.11.4 netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.16.1 cftime: 1.6.3 nc_time_axis: 1.4.1 iris: None bottleneck: 1.3.5 dask: 2023.12.0 distributed: 2023.12.0 matplotlib: 3.8.0 cartopy: 0.22.0 seaborn: None numbagg: None fsspec: 2023.10.0 cupy: None pint: None sparse: 0.15.1 flox: None numpy_groupies: None setuptools: 68.2.2 pip: 23.3.1 conda: None pytest: 7.4.3 mypy: None IPython: 8.20.0 sphinx: None

ezekiel-lemur avatar Apr 23 '24 11:04 ezekiel-lemur

Thanks for opening this issue @ezekiel-lemur! We'll take a look at it.

tomvothecoder avatar Apr 23 '24 16:04 tomvothecoder

It looks like the data used here does not have standard or conventional time axis. @ezekiel-lemur would it be possible to share your data so one can try reproduce your error?

lee1043 avatar Apr 23 '24 16:04 lee1043

Thank you @lee1043 and @tomvothecoder ! I have uploaded a small subset of the data (rightfully named subset.nc), it breaks with my minimum breaking example too.

subset.nc.zip

ezekiel-lemur avatar Apr 23 '24 20:04 ezekiel-lemur

@ezekiel-lemur thanks for the data. I was able to reproduce the error.

I think the issue might be related to the 2-dimensional time coordinate. In most case xcdat presumes dataArray for coordinate or axis to be 1-dimensional (@tomvothecoder please correct me if I am wrong), but your time dimension has 2 dimensions. And in the dataset, time coordinate is not associated with any of data variables (e.g., S, T, age, ...).

It looks like there are different observation points and each of them has different trajectory (not sure trajectory here means time or not). I also noticed that the time coordinate repeated values for different trajectories, which I got confused for the structure of the data. Can you say little bit more about your data? e.g., expected dimensions for each variables?

lee1043 avatar Apr 23 '24 21:04 lee1043

In most case xcdat presumes dataArray for coordinate or axis to be 1-dimensional (@tomvothecoder please correct me if I am wrong)

Thanks for helping debug, Jiwoo! Yes this is correct, xCDAT expects 1-D coordinates.

tomvothecoder avatar Apr 23 '24 22:04 tomvothecoder

Why can't I reproduce the error (just tried 0.6.1 and 0.7.0)?

# imports
import xcdat as xc

# I/0
fn = 'subset.nc'
ds = xc.open_dataset(fn, decode_times=False)
dsd = xc.decode_time(ds)
dsd.time.values

/opt/homebrew/Caskroom/miniconda/base/envs/xcdat/lib/python3.12/site-packages/xarray/coding/times.py:240: RuntimeWarning: invalid value encountered in cast cftime.num2date(num_dates, units, calendar, only_use_cftime_datetimes=True) Out[46]: array([[cftime.DatetimeNoLeap(499, 2, 11, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(499, 2, 10, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(499, 2, 9, 0, 0, 0, 0, has_year_zero=True), ...,

pochedls avatar Apr 24 '24 02:04 pochedls

Why can't I reproduce the error (just tried 0.6.1 and 0.7.0)?

# imports
import xcdat as xc

# I/0
fn = 'subset.nc'
ds = xc.open_dataset(fn, decode_times=False)
dsd = xc.decode_time(ds)
dsd.time.values

@pochedls that's interesting... I still get the same error when running your code. My xcdat version is 0.6.1.

lee1043 avatar Apr 24 '24 05:04 lee1043

Thank you for looking into it.Indeed the data is from a Lagrangian backtracking experiment, where along one particle trajectory observations are made (at given times). Do you think it would be possible to convert each dimension separately with xcdat? 

ezekiel-lemur avatar Apr 24 '24 07:04 ezekiel-lemur

This is working for me on an M2 Mac (updated xarray and xcdat), but not linux. The problem on linux traces to cftime:

cftime.num2date(ds.time.values, ds.time.units, 'standard', only_use_cftime_datetimes=True)

Both Mac and Linux are using cftime 1.6.2.

On Linux the error I get is:

# imports
import xcdat as xc
import cftime

# I/0
fn = 'subset.nc'
ds = xc.open_dataset(fn, decode_times=False)
cftime.num2date(ds.time.values, ds.time.units, 'standard', only_use_cftime_datetimes=True)

<ipython-input-4-abe4db8192af>:1: RuntimeWarning: invalid value encountered in cast cftime.num2date(ds.time.values, ds.time.units, 'standard', only_use_cftime_datetimes=True) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In [4], line 1 ----> 1 cftime.num2date(ds.time.values, ds.time.units, 'standard', only_use_cftime_datetimes=True) File src/cftime/_cftime.pyx:630, in cftime._cftime.num2date() File src/cftime/_cftime.pyx:499, in cftime._cftime.decode_dates_from_array() TypeError: unsupported operand type(s) for +: 'cftime._cftime.DatetimeGregorian' and 'NoneType'

If someone else could verify this on Mac, then I think this might be isolated to a cftime + linux issue.

pochedls avatar Apr 24 '24 15:04 pochedls

If someone else could verify this on Mac, then I think this might be isolated to a cftime + linux issue.

I created a fresh environment with conda create -n xcdat_070 -c conda-forge xcdat=0.7.0 xarray=2024.3.0 cftime=1.6.3.

I ran both versions of the code here and here on my M2 Mac and both worked, although I get a RuntimeWarning.

xcdat.__version__
'0.7.0'

xarray.__version__
'2024.3.0'  # Also worked with '2024.2.0' and '2023.11.0'

cftime.__version__
'1.6.3'. # Also worked with '1.6.2'
<ipython-input-2-858b1454e938>:5: RuntimeWarning: invalid value encountered in cast
  cftime.num2date(ds.time.values, ds.time.units, 'standard', only_use_cftime_datetimes=True)
array([[cftime.DatetimeGregorian(499, 2, 8, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(499, 2, 7, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(499, 2, 6, 0, 0, 0, 0, has_year_zero=False),
        ...,
        cftime.DatetimeGregorian(496, 8, 5, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(496, 8, 4, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(488, 2, 1, 0, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(499, 2, 8, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(499, 2, 7, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(499, 2, 6, 0, 0, 0, 0, has_year_zero=False),
        ...,
        cftime.DatetimeGregorian(496, 8, 5, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(496, 8, 4, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(488, 2, 1, 0, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(499, 2, 8, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(499, 2, 7, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(499, 2, 6, 0, 0, 0, 0, has_year_zero=False),
        ...,
        cftime.DatetimeGregorian(496, 8, 5, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(496, 8, 4, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(488, 2, 1, 0, 0, 0, 0, has_year_zero=False)],
       ...,
       [cftime.DatetimeGregorian(499, 2, 8, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(499, 2, 7, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(499, 2, 6, 0, 0, 0, 0, has_year_zero=False),
        ...,
        cftime.DatetimeGregorian(496, 8, 5, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(496, 8, 4, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(488, 2, 1, 0, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(499, 2, 8, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(499, 2, 7, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(499, 2, 6, 0, 0, 0, 0, has_year_zero=False),
        ...,
        cftime.DatetimeGregorian(496, 8, 5, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(496, 8, 4, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(488, 2, 1, 0, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(499, 2, 8, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(499, 2, 7, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(499, 2, 6, 0, 0, 0, 0, has_year_zero=False),
        ...,
        cftime.DatetimeGregorian(496, 8, 5, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(496, 8, 4, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(488, 2, 1, 0, 0, 0, 0, has_year_zero=False)]],
      dtype=object)

tomvothecoder avatar Apr 24 '24 16:04 tomvothecoder

I ran the same sets of code in my above comment on Linux (RH7) and I get the same error.

I think this confirms it is a cftime + Linux issue. I'll open a ticket over on the cftime repo.

UPDATE: Just opened a ticket on the cftime repo here

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File [/home/vo13/xCDAT/xcdat/648_qa.py:5](https://vscode-remote+ssh-002dremote-002bacme1.vscode-resource.vscode-cdn.net/home/vo13/xCDAT/xcdat/648_qa.py:5)
      [3](https://vscode-remote+ssh-002dremote-002bacme1.vscode-resource.vscode-cdn.net/home/vo13/xCDAT/xcdat/648_qa.py:3) fn = 'subset.nc'
      [4](https://vscode-remote+ssh-002dremote-002bacme1.vscode-resource.vscode-cdn.net/home/vo13/xCDAT/xcdat/648_qa.py:4) ds = xc.open_dataset(fn, decode_times=False)
----> [5](https://vscode-remote+ssh-002dremote-002bacme1.vscode-resource.vscode-cdn.net/home/vo13/xCDAT/xcdat/648_qa.py:5) cftime.num2date(ds.time.values, ds.time.units, 'standard', only_use_cftime_datetimes=True)

File src/cftime/_cftime.pyx:630, in cftime._cftime.num2date()

File src/cftime/_cftime.pyx:499, in cftime._cftime.decode_dates_from_array()

TypeError: unsupported operand type(s) for +: 'cftime._cftime.DatetimeGregorian' and 'NoneType'

tomvothecoder avatar Apr 24 '24 16:04 tomvothecoder

I ran the same sets of code in my above comment on Linux (RH7) and I get the same error.

I think this confirms it is a cftime + Linux issue. I'll open a ticket over on the cftime repo.

UPDATE: Just opened a ticket on the cftime repo here

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File [/home/vo13/xCDAT/xcdat/648_qa.py:5](https://vscode-remote+ssh-002dremote-002bacme1.vscode-resource.vscode-cdn.net/home/vo13/xCDAT/xcdat/648_qa.py:5)
      [3](https://vscode-remote+ssh-002dremote-002bacme1.vscode-resource.vscode-cdn.net/home/vo13/xCDAT/xcdat/648_qa.py:3) fn = 'subset.nc'
      [4](https://vscode-remote+ssh-002dremote-002bacme1.vscode-resource.vscode-cdn.net/home/vo13/xCDAT/xcdat/648_qa.py:4) ds = xc.open_dataset(fn, decode_times=False)
----> [5](https://vscode-remote+ssh-002dremote-002bacme1.vscode-resource.vscode-cdn.net/home/vo13/xCDAT/xcdat/648_qa.py:5) cftime.num2date(ds.time.values, ds.time.units, 'standard', only_use_cftime_datetimes=True)

File src/cftime/_cftime.pyx:630, in cftime._cftime.num2date()

File src/cftime/_cftime.pyx:499, in cftime._cftime.decode_dates_from_array()

TypeError: unsupported operand type(s) for +: 'cftime._cftime.DatetimeGregorian' and 'NoneType'

Thanks for following up with cftime :)

ezekiel-lemur avatar Apr 24 '24 18:04 ezekiel-lemur

@ezekiel-lemur Of course! Thanks for working with us through this issue. Hopefully it can be resolved soon. The (inconvenient) alternative is to try your code on a non-Linux based machine for now.

tomvothecoder avatar Apr 25 '24 16:04 tomvothecoder

Just an update about why this bug only happens on Linux (comment):

I suspect that the problem is that the operations with NaNs can give compiler-dependent results (hence the difference with gcc (linux) and clang (macosx)). I think the way to avoid this is to either raise an exception if nans or infs are found, or treat them as missing values (as PR #330 does).

tomvothecoder avatar May 01 '24 16:05 tomvothecoder

Hey! I see the issue was closed on Nctime, but I am not sure what the solution is for now? Would you be so kind and let me know please :)

ezekiel-lemur avatar May 13 '24 14:05 ezekiel-lemur

Hey! I see the issue was closed on Nctime, but I am not sure what the solution is for now? Would you be so kind and let me know please :)

Hi @ezekiel-lemur, here's a quick, temporary, hacky workaround to this issue on Linux.

It involves converting np.nan values to 0s, decoding to cftime.datetime, then converting the 0s (as cftime.datetime) back to np.nan values.

import cftime
import numpy as np
import xarray as xr
import xcdat as xc

# Open the dataset.
filepath = "subset.nc"
ds = xr.open_dataset(filepath, decode_times=False)

# _FillValue is `nan`, so we need to use another value (I choose 0)
print(ds.time.encoding["_FillValue"])

# Make sure no values are 0, which we use to represent missing values.
# If 0's are present, we will inadvertently drop actual time coordinate values
# and instead need to use another fill value.
# The print shows the shape is (100, 920), which is the same as before. 
# We can use 0 to represent missing values.
print(ds.time.where(ds.time != 0, drop=True).shape)

# Convert `np.nan` to 0 to represent missing values.
ds["time"] = ds.time.fillna(0)
ds["time"].attrs["axis"] = "time"
ds["time"].attrs["long_name"] = "time"
ds = ds.set_coords(("time"))

# Decode the time coordinates.
ds_decoded = xc.decode_time(ds)

# Convert the cftime.datetime value 0 back to `np.nan`.
fill_value = cftime.num2date([0], ds.time.units, calendar=ds.time.calendar)
ds_decoded["time"] = ds_decoded.time.where(
    ds_decoded.time != fill_value, np.nan, drop=False
)

Alternatively, you can wait for a new release of cftime that includes PR 330, or use MacOS to run your code (with the caveat being the behavior noted here).

tomvothecoder avatar May 15 '24 20:05 tomvothecoder

Just a note...it's not clear to me that things will work correctly after the cftime PR is merged (I'm not sure how xarray is going to handle masked arrays from cftime).

pochedls avatar May 15 '24 20:05 pochedls

Just a note...it's not clear to me that things will work correctly after the cftime PR is merged (I'm not sure how xarray is going to handle masked arrays from cftime).

I'm also not sure how Xarray handles masked arrays within xr.DataArrays. If this ends up breaking something in xarray, we'll need to open up an issue there. Maybe the masked array can automatically be converted to a numpy array with missing values represented by np.nan (using this code), before the decoded time xr.DataArray is built. We can implement this logic ourselves in xCDAT's _get_cf_time_coords().

tomvothecoder avatar May 15 '24 21:05 tomvothecoder