xarray
xarray copied to clipboard
Handle NaNs when decoding times (failures on riscv64)
What happened?
FAILED xarray/tests/test_backends.py::TestScipyInMemoryData::test_roundtrip_numpy_datetime_data FAILED xarray/tests/test_backends.py::TestScipyFileObject::test_roundtrip_numpy_datetime_data FAILED xarray/tests/test_backends.py::TestGenericNetCDFData::test_roundtrip_numpy_datetime_data FAILED xarray/tests/test_backends.py::TestScipyFilePath::test_roundtrip_numpy_datetime_data = 4 failed, 4636 passed, 5632 skipped, 19 xfailed, 22 xpassed, 38 warnings in 266.18s (0:04:26) =
What did you expect to happen?
No failures
Minimal Complete Verifiable Example
pytest-3.10 -n auto /usr/lib/python3.10/site-packages/xarray
MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
Relevant log output
=================================== FAILURES ===================================
___________ TestScipyInMemoryData.test_roundtrip_numpy_datetime_data ___________
[gw2] linux -- Python 3.10.7 /usr/bin/python3.10
num_dates = array([ 0., nan]), units = 'days since 2000-01-01 00:00:00'
calendar = 'proleptic_gregorian', use_cftime = None
def decode_cf_datetime(num_dates, units, calendar=None, use_cftime=None):
"""Given an array of numeric dates in netCDF format, convert it into a
numpy array of date time objects.
For standard (Gregorian) calendars, this function uses vectorized
operations, which makes it much faster than cftime.num2date. In such a
case, the returned array will be of type np.datetime64.
Note that time unit in `units` must not be smaller than microseconds and
not larger than days.
See Also
--------
cftime.num2date
"""
num_dates = np.asarray(num_dates)
flat_num_dates = num_dates.ravel()
if calendar is None:
calendar = "standard"
if use_cftime is None:
try:
> dates = _decode_datetime_with_pandas(flat_num_dates, units, calendar)
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/coding/times.py:270:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
flat_num_dates = array([ 0., nan]), units = 'days since 2000-01-01 00:00:00'
calendar = 'proleptic_gregorian'
def _decode_datetime_with_pandas(flat_num_dates, units, calendar):
if not _is_standard_calendar(calendar):
raise OutOfBoundsDatetime(
"Cannot decode times from a non-standard calendar, {!r}, using "
"pandas.".format(calendar)
)
delta, ref_date = _unpack_netcdf_time_units(units)
delta = _netcdf_to_numpy_timeunit(delta)
try:
ref_date = pd.Timestamp(ref_date)
except ValueError:
# ValueError is raised by pd.Timestamp for non-ISO timestamp
# strings, in which case we fall back to using cftime
raise OutOfBoundsDatetime
with warnings.catch_warnings():
warnings.filterwarnings("ignore", "invalid value encountered", RuntimeWarning)
pd.to_timedelta(flat_num_dates.min(), delta) + ref_date
pd.to_timedelta(flat_num_dates.max(), delta) + ref_date
# To avoid integer overflow when converting to nanosecond units for integer
# dtypes smaller than np.int64 cast all integer and unsigned integer dtype
# arrays to np.int64 (GH 2002, GH 6589). Note this is safe even in the case
# of np.uint64 values, because any np.uint64 value that would lead to
# overflow when converting to np.int64 would not be representable with a
# timedelta64 value, and therefore would raise an error in the lines above.
if flat_num_dates.dtype.kind in "iu":
flat_num_dates = flat_num_dates.astype(np.int64)
# Cast input ordinals to integers of nanoseconds because pd.to_timedelta
# works much faster when dealing with integers (GH 1399).
flat_num_dates_ns_int = (flat_num_dates * _NS_PER_TIME_DELTA[delta]).astype(
np.int64
)
# Use pd.to_timedelta to safely cast integer values to timedeltas,
# and add those to a Timestamp to safely produce a DatetimeIndex. This
# ensures that we do not encounter integer overflow at any point in the
# process without raising OutOfBoundsDatetime.
> return (pd.to_timedelta(flat_num_dates_ns_int, "ns") + ref_date).values
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/coding/times.py:245:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = TimedeltaIndex(['0 days 00:00:00', '106751 days 23:47:16.854775807'], dtype='timedelta64[ns]', freq=None)
other = Timestamp('2000-01-01 00:00:00')
@wraps(method)
def new_method(self, other):
if is_cmp and isinstance(self, ABCIndex) and isinstance(other, ABCSeries):
# For comparison ops, Index does *not* defer to Series
pass
else:
for cls in [ABCDataFrame, ABCSeries, ABCIndex]:
if isinstance(self, cls):
break
if isinstance(other, cls):
return NotImplemented
other = item_from_zerodim(other)
> return method(self, other)
/usr/lib64/python3.10/site-packages/pandas/core/ops/common.py:70:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = TimedeltaIndex(['0 days 00:00:00', '106751 days 23:47:16.854775807'], dtype='timedelta64[ns]', freq=None)
other = Timestamp('2000-01-01 00:00:00')
@unpack_zerodim_and_defer("__add__")
def __add__(self, other):
> return self._arith_method(other, operator.add)
/usr/lib64/python3.10/site-packages/pandas/core/arraylike.py:100:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = TimedeltaIndex(['0 days 00:00:00', '106751 days 23:47:16.854775807'], dtype='timedelta64[ns]', freq=None)
other = Timestamp('2000-01-01 00:00:00'), op = <built-in function add>
def _arith_method(self, other, op):
if (
isinstance(other, Index)
and is_object_dtype(other.dtype)
and type(other) is not Index
):
# We return NotImplemented for object-dtype index *subclasses* so they have
# a chance to implement ops before we unwrap them.
# See https://github.com/pandas-dev/pandas/issues/31109
return NotImplemented
> return super()._arith_method(other, op)
/usr/lib64/python3.10/site-packages/pandas/core/indexes/base.py:6734:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = TimedeltaIndex(['0 days 00:00:00', '106751 days 23:47:16.854775807'], dtype='timedelta64[ns]', freq=None)
other = Timestamp('2000-01-01 00:00:00'), op = <built-in function add>
def _arith_method(self, other, op):
res_name = ops.get_op_result_name(self, other)
lvalues = self._values
rvalues = extract_array(other, extract_numpy=True, extract_range=True)
rvalues = ops.maybe_prepare_scalar_for_op(rvalues, lvalues.shape)
rvalues = ensure_wrapped_if_datetimelike(rvalues)
with np.errstate(all="ignore"):
> result = ops.arithmetic_op(lvalues, rvalues, op)
/usr/lib64/python3.10/site-packages/pandas/core/base.py:1295:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
left = <TimedeltaArray>
['0 days 00:00:00', '106751 days 23:47:16.854775807']
Length: 2, dtype: timedelta64[ns]
right = Timestamp('2000-01-01 00:00:00'), op = <built-in function add>
def arithmetic_op(left: ArrayLike, right: Any, op):
"""
Evaluate an arithmetic operation `+`, `-`, `*`, `/`, `//`, `%`, `**`, ...
Note: the caller is responsible for ensuring that numpy warnings are
suppressed (with np.errstate(all="ignore")) if needed.
Parameters
----------
left : np.ndarray or ExtensionArray
right : object
Cannot be a DataFrame or Index. Series is *not* excluded.
op : {operator.add, operator.sub, ...}
Or one of the reversed variants from roperator.
Returns
-------
ndarray or ExtensionArray
Or a 2-tuple of these in the case of divmod or rdivmod.
"""
# NB: We assume that extract_array and ensure_wrapped_if_datetimelike
# have already been called on `left` and `right`,
# and `maybe_prepare_scalar_for_op` has already been called on `right`
# We need to special-case datetime64/timedelta64 dtypes (e.g. because numpy
# casts integer dtypes to timedelta64 when operating with timedelta64 - GH#22390)
if (
should_extension_dispatch(left, right)
or isinstance(right, (Timedelta, BaseOffset, Timestamp))
or right is NaT
):
# Timedelta/Timestamp and other custom scalars are included in the check
# because numexpr will fail on it, see GH#31457
> res_values = op(left, right)
/usr/lib64/python3.10/site-packages/pandas/core/ops/array_ops.py:216:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <TimedeltaArray>
['0 days 00:00:00', '106751 days 23:47:16.854775807']
Length: 2, dtype: timedelta64[ns]
other = Timestamp('2000-01-01 00:00:00')
@wraps(method)
def new_method(self, other):
if is_cmp and isinstance(self, ABCIndex) and isinstance(other, ABCSeries):
# For comparison ops, Index does *not* defer to Series
pass
else:
for cls in [ABCDataFrame, ABCSeries, ABCIndex]:
if isinstance(self, cls):
break
if isinstance(other, cls):
return NotImplemented
other = item_from_zerodim(other)
> return method(self, other)
/usr/lib64/python3.10/site-packages/pandas/core/ops/common.py:70:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <TimedeltaArray>
['0 days 00:00:00', '106751 days 23:47:16.854775807']
Length: 2, dtype: timedelta64[ns]
other = Timestamp('2000-01-01 00:00:00')
@unpack_zerodim_and_defer("__add__")
def __add__(self, other):
other_dtype = getattr(other, "dtype", None)
# scalar others
if other is NaT:
result = self._add_nat()
elif isinstance(other, (Tick, timedelta, np.timedelta64)):
result = self._add_timedeltalike_scalar(other)
elif isinstance(other, BaseOffset):
# specifically _not_ a Tick
result = self._add_offset(other)
elif isinstance(other, (datetime, np.datetime64)):
> result = self._add_datetimelike_scalar(other)
/usr/lib64/python3.10/site-packages/pandas/core/arrays/datetimelike.py:1264:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <TimedeltaArray>
['0 days 00:00:00', '106751 days 23:47:16.854775807']
Length: 2, dtype: timedelta64[ns]
other = Timestamp('2000-01-01 00:00:00')
def _add_datetimelike_scalar(self, other) -> DatetimeArray:
# adding a timedeltaindex to a datetimelike
from pandas.core.arrays import DatetimeArray
assert other is not NaT
other = Timestamp(other)
if other is NaT:
# In this case we specifically interpret NaT as a datetime, not
# the timedelta interpretation we would get by returning self + NaT
result = self.asi8.view("m8[ms]") + NaT.to_datetime64()
return DatetimeArray(result)
i8 = self.asi8
> result = checked_add_with_arr(i8, other.value, arr_mask=self._isnan)
/usr/lib64/python3.10/site-packages/pandas/core/arrays/timedeltas.py:482:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
arr = array([ 0, 9223372036854775807]), b = 946684800000000000
arr_mask = array([False, False]), b_mask = None
def checked_add_with_arr(
arr: np.ndarray,
b,
arr_mask: npt.NDArray[np.bool_] | None = None,
b_mask: npt.NDArray[np.bool_] | None = None,
) -> np.ndarray:
"""
Perform array addition that checks for underflow and overflow.
Performs the addition of an int64 array and an int64 integer (or array)
but checks that they do not result in overflow first. For elements that
are indicated to be NaN, whether or not there is overflow for that element
is automatically ignored.
Parameters
----------
arr : array addend.
b : array or scalar addend.
arr_mask : np.ndarray[bool] or None, default None
array indicating which elements to exclude from checking
b_mask : np.ndarray[bool] or None, default None
array or scalar indicating which element(s) to exclude from checking
Returns
-------
sum : An array for elements x + b for each element x in arr if b is
a scalar or an array for elements x + y for each element pair
(x, y) in (arr, b).
Raises
------
OverflowError if any x + y exceeds the maximum or minimum int64 value.
"""
# For performance reasons, we broadcast 'b' to the new array 'b2'
# so that it has the same size as 'arr'.
b2 = np.broadcast_to(b, arr.shape)
if b_mask is not None:
# We do the same broadcasting for b_mask as well.
b2_mask = np.broadcast_to(b_mask, arr.shape)
else:
b2_mask = None
# For elements that are NaN, regardless of their value, we should
# ignore whether they overflow or not when doing the checked add.
if arr_mask is not None and b2_mask is not None:
not_nan = np.logical_not(arr_mask | b2_mask)
elif arr_mask is not None:
not_nan = np.logical_not(arr_mask)
elif b_mask is not None:
# Argument 1 to "__call__" of "_UFunc_Nin1_Nout1" has incompatible type
# "Optional[ndarray[Any, dtype[bool_]]]"; expected
# "Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[An
# y]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool,
# int, float, complex, str, bytes]]]" [arg-type]
not_nan = np.logical_not(b2_mask) # type: ignore[arg-type]
else:
not_nan = np.empty(arr.shape, dtype=bool)
not_nan.fill(True)
# gh-14324: For each element in 'arr' and its corresponding element
# in 'b2', we check the sign of the element in 'b2'. If it is positive,
# we then check whether its sum with the element in 'arr' exceeds
# np.iinfo(np.int64).max. If so, we have an overflow error. If it
# it is negative, we then check whether its sum with the element in
# 'arr' exceeds np.iinfo(np.int64).min. If so, we have an overflow
# error as well.
i8max = lib.i8max
i8min = iNaT
mask1 = b2 > 0
mask2 = b2 < 0
if not mask1.any():
to_raise = ((i8min - b2 > arr) & not_nan).any()
elif not mask2.any():
to_raise = ((i8max - b2 < arr) & not_nan).any()
else:
to_raise = ((i8max - b2[mask1] < arr[mask1]) & not_nan[mask1]).any() or (
(i8min - b2[mask2] > arr[mask2]) & not_nan[mask2]
).any()
if to_raise:
> raise OverflowError("Overflow in int64 addition")
E OverflowError: Overflow in int64 addition
/usr/lib64/python3.10/site-packages/pandas/core/algorithms.py:1114: OverflowError
During handling of the above exception, another exception occurred:
data = <xarray.backends.scipy_.ScipyArrayWrapper object at 0x40238999c0>
units = 'days since 2000-01-01 00:00:00', calendar = 'proleptic_gregorian'
use_cftime = None
def _decode_cf_datetime_dtype(data, units, calendar, use_cftime):
# Verify that at least the first and last date can be decoded
# successfully. Otherwise, tracebacks end up swallowed by
# Dataset.__repr__ when users try to view their lazily decoded array.
values = indexing.ImplicitToExplicitIndexingAdapter(indexing.as_indexable(data))
example_value = np.concatenate(
[first_n_items(values, 1) or [0], last_item(values) or [0]]
)
try:
> result = decode_cf_datetime(example_value, units, calendar, use_cftime)
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/coding/times.py:180:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
num_dates = array([ 0., nan]), units = 'days since 2000-01-01 00:00:00'
calendar = 'proleptic_gregorian', use_cftime = None
def decode_cf_datetime(num_dates, units, calendar=None, use_cftime=None):
"""Given an array of numeric dates in netCDF format, convert it into a
numpy array of date time objects.
For standard (Gregorian) calendars, this function uses vectorized
operations, which makes it much faster than cftime.num2date. In such a
case, the returned array will be of type np.datetime64.
Note that time unit in `units` must not be smaller than microseconds and
not larger than days.
See Also
--------
cftime.num2date
"""
num_dates = np.asarray(num_dates)
flat_num_dates = num_dates.ravel()
if calendar is None:
calendar = "standard"
if use_cftime is None:
try:
dates = _decode_datetime_with_pandas(flat_num_dates, units, calendar)
except (KeyError, OutOfBoundsDatetime, OutOfBoundsTimedelta, OverflowError):
> dates = _decode_datetime_with_cftime(
flat_num_dates.astype(float), units, calendar
)
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/coding/times.py:272:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
num_dates = array([ 0., nan]), units = 'days since 2000-01-01 00:00:00'
calendar = 'proleptic_gregorian'
def _decode_datetime_with_cftime(num_dates, units, calendar):
if cftime is None:
> raise ModuleNotFoundError("No module named 'cftime'")
E ModuleNotFoundError: No module named 'cftime'
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/coding/times.py:199: ModuleNotFoundError
During handling of the above exception, another exception occurred:
self = <xarray.tests.test_backends.TestScipyInMemoryData object at 0x4010bfceb0>
@arm_xfail
def test_roundtrip_numpy_datetime_data(self):
times = pd.to_datetime(["2000-01-01", "2000-01-02", "NaT"])
expected = Dataset({"t": ("t", times), "t0": times[0]})
kwargs = {"encoding": {"t0": {"units": "days since 1950-01-01"}}}
> with self.roundtrip(expected, save_kwargs=kwargs) as actual:
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/tests/test_backends.py:510:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib64/python3.10/contextlib.py:135: in __enter__
return next(self.gen)
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/tests/test_backends.py:286: in roundtrip
with self.open(path, **open_kwargs) as ds:
/usr/lib64/python3.10/contextlib.py:135: in __enter__
return next(self.gen)
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/tests/test_backends.py:312: in open
with open_dataset(path, engine=self.engine, **kwargs) as ds:
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/backends/api.py:531: in open_dataset
backend_ds = backend.open_dataset(
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/backends/scipy_.py:285: in open_dataset
ds = store_entrypoint.open_dataset(
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/backends/store.py:29: in open_dataset
vars, attrs, coord_names = conventions.decode_cf_variables(
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/conventions.py:521: in decode_cf_variables
new_vars[k] = decode_cf_variable(
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/conventions.py:369: in decode_cf_variable
var = times.CFDatetimeCoder(use_cftime=use_cftime).decode(var, name=name)
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/coding/times.py:682: in decode
dtype = _decode_cf_datetime_dtype(data, units, calendar, self.use_cftime)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
data = <xarray.backends.scipy_.ScipyArrayWrapper object at 0x40238999c0>
units = 'days since 2000-01-01 00:00:00', calendar = 'proleptic_gregorian'
use_cftime = None
def _decode_cf_datetime_dtype(data, units, calendar, use_cftime):
# Verify that at least the first and last date can be decoded
# successfully. Otherwise, tracebacks end up swallowed by
# Dataset.__repr__ when users try to view their lazily decoded array.
values = indexing.ImplicitToExplicitIndexingAdapter(indexing.as_indexable(data))
example_value = np.concatenate(
[first_n_items(values, 1) or [0], last_item(values) or [0]]
)
try:
result = decode_cf_datetime(example_value, units, calendar, use_cftime)
except Exception:
calendar_msg = (
"the default calendar" if calendar is None else f"calendar {calendar!r}"
)
msg = (
f"unable to decode time units {units!r} with {calendar_msg!r}. Try "
"opening your dataset with decode_times=False or installing cftime "
"if it is not installed."
)
> raise ValueError(msg)
E ValueError: unable to decode time units 'days since 2000-01-01 00:00:00' with "calendar 'proleptic_gregorian'". Try opening your dataset with decode_times=False or installing cftime if it is not installed.
/home/abuild/rpmbuild/BUILD/xarray-2022.6.0/xarray/coding/times.py:190: ValueError
Anything else we need to know?
https://build.opensuse.org/package/live_build_log/openSUSE:Factory:RISCV/python-xarray/standard/riscv64
import xarray as xr import numpy as np import pandas as pd num_dates = np.asarray([0., np.nan]) flat_num_dates = num_dates.ravel() flat_num_dates_ns_int = (flat_num_dates * (int(1e9) * 60 * 60 * 24)).astype(np.int64) flat_num_dates_ns_int array([ 0, 9223372036854775807]) pd.to_timedelta(flat_num_dates_ns_int, "ns") TimedeltaIndex(['0 days 00:00:00', '106751 days 23:47:16.854775807'], dtype='timedelta64[ns]', freq=None) pd.to_timedelta(flat_num_dates, "ns") TimedeltaIndex(['0 days', NaT], dtype='timedelta64[ns]', freq=None)
Environment
/usr/lib/python3.10/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.")
INSTALLED VERSIONS
commit: None python: 3.10.7 (main, Sep 11 2022, 08:41:56) [GCC] python-bits: 64 OS: Linux OS-release: 5.19.10-1-default machine: riscv64 processor: riscv64 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: ('de_DE', 'UTF-8') libhdf5: None libnetcdf: None
xarray: 2022.6.0 pandas: 1.4.4 numpy: 1.21.6 scipy: 1.8.1 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.5 dask: None distributed: None matplotlib: 3.5.3 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 63.2.0 pip: 22.0.4 conda: None pytest: 7.1.2 IPython: 8.5.0 sphinx: None
It looks lie many of these occur in pandas code — do pandas tests pass?
On Sep 28 2022, Maximilian Roos wrote:
It looks lie many of these occur in pandas code — do pandas tests pass?
That's because xarray is passing bogus values.
What are the bogus values?
Please could you answer the question on whether pandas tests pass?
array([ 0, 9223372036854775807])
I'm not sure what that has to do with xarray though? Does this give the same result?
import numpy as np
import pandas as pd
num_dates = np.asarray([0., np.nan])
flat_num_dates = num_dates.ravel()
flat_num_dates_ns_int = (flat_num_dates * (int(1e9) * 60 * 60 * 24)).astype(np.int64)
flat_num_dates_ns_int
array([ 0, 9223372036854775807])
Please could you answer the question on whether pandas tests pass?
We're here helping as volunteers; we can only engage on issues if you reciprocate our good faith. Please could you answer this?
Closing but please feel free to reopen
Hi, we are getting similar failures when building xarray for Arch Linux riscv64.
I'm not sure what that has to do with xarray though? Does this give the same result?
import numpy as np import pandas as pd num_dates = np.asarray([0., np.nan]) flat_num_dates = num_dates.ravel() flat_num_dates_ns_int = (flat_num_dates * (int(1e9) * 60 * 60 * 24)).astype(np.int64) flat_num_dates_ns_int array([ 0, 9223372036854775807])
I got the same result in riscv64. One thing I could guess is that the sign bit of NaN is not kept during conversions. Some more details could be found at: https://sourceware.org/pipermail/libc-alpha/2022-September/142011.html
Repeating the same steps result in array([0, -9223372036854775808])
in x86_64 and array([0, 0])
in aarch64.
Please could you answer the question on whether pandas tests pass?
I have tried pandas' tests and got many failures like:
E AssertionError: Attributes of DataFrame.iloc[:, 4] (column name="date") are different
E
E Attribute "dtype" are different
E [left]: float64
E [right]: datetime64[ns]
or
E AssertionError: numpy array are different
E
E numpy array values are different (50.0 %)
E [index]: [0, 1]
E [left]: [1036713600000, -9223372036854775808]
E [right]: [1036713600000000000, -9223372036854775808]
Quite some of the tests are having NaN in the context as well. So you are probably right that pandas or numpy may be where the problem lies.
I got the same result in riscv64. One thing I could guess is that the sign bit of NaN is not kept during conversions. Some more details could be found at
Thanks for trying that. Notably, that code doesn't have xarray in. So I'm keen to be part of the solution, but it doesn't look to be a problem with xarray code specifically. Let me know if that makes sense.
As in #7098
I think the real solution here is to explicitly handle NaNs during the decoding step. We do want these to be NaT in the output.
@felixonmars If you are still in the works with this, I'd appreciate if you could test this against #7827. Thanks.
@kmuehlbauer Sure. I have verified that the tests are passing on #7827 and failing on the current main branch.