mystikos icon indicating copy to clipboard operation
mystikos copied to clipboard

Pandas overflow error

Open vtikoo opened this issue 2 years ago • 1 comments

[2022-07-22T21:07:33.538Z] ________________ TestTimestampUnaryOps.test_round_sanity[ceil] _________________
[2022-07-22T21:07:33.538Z] 
[2022-07-22T21:07:33.538Z] self = <pandas.tests.scalar.timestamp.test_unary_ops.TestTimestampUnaryOps object at 0x18e165910>
[2022-07-22T21:07:33.538Z] method = <cyfunction Timestamp.ceil at 0x1b56adee0>
[2022-07-22T21:07:33.538Z] 
[2022-07-22T21:07:33.538Z]     @given(val=st.integers(iNaT + 1, lib.i8max))
[2022-07-22T21:07:33.538Z] >   @pytest.mark.parametrize(
[2022-07-22T21:07:33.538Z]         "method", [Timestamp.round, Timestamp.floor, Timestamp.ceil]
[2022-07-22T21:07:33.538Z]     )
[2022-07-22T21:07:33.538Z] 
[2022-07-22T21:07:33.538Z] /usr/local/lib/python3.9/site-packages/pandas/tests/scalar/timestamp/test_unary_ops.py:284: 
[2022-07-22T21:07:33.538Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2022-07-22T21:07:33.538Z] /usr/local/lib/python3.9/site-packages/pandas/tests/scalar/timestamp/test_unary_ops.py:332: in test_round_sanity
[2022-07-22T21:07:33.538Z]     res = method(ts, "D")
[2022-07-22T21:07:33.538Z] pandas/_libs/tslibs/timestamps.pyx:1710: in pandas._libs.tslibs.timestamps.Timestamp.ceil
[2022-07-22T21:07:33.538Z]     ???
[2022-07-22T21:07:33.538Z] pandas/_libs/tslibs/timestamps.pyx:1431: in pandas._libs.tslibs.timestamps.Timestamp._round
[2022-07-22T21:07:33.538Z]     ???
[2022-07-22T21:07:33.538Z] pandas/_libs/tslibs/fields.pyx:724: in pandas._libs.tslibs.fields.round_nsint64
[2022-07-22T21:07:33.538Z]     ???
[2022-07-22T21:07:33.539Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2022-07-22T21:07:33.539Z] 
[2022-07-22T21:07:33.539Z] >   ???
[2022-07-22T21:07:33.539Z] E   OverflowError: Python int too large to convert to C long
[2022-07-22T21:07:33.539Z] 
[2022-07-22T21:07:33.539Z] pandas/_libs/tslibs/fields.pyx:688: OverflowError
[2022-07-22T21:07:33.539Z] ---------------------------------- Hypothesis ----------------------------------
[2022-07-22T21:07:33.539Z] Falsifying example: test_round_sanity(
[2022-07-22T21:07:33.539Z]     val=9223286400000000001,
[2022-07-22T21:07:33.539Z]     self=<pandas.tests.scalar.timestamp.test_unary_ops.TestTimestampUnaryOps at 0x18e165910>,
[2022-07-22T21:07:33.539Z]     method=<cyfunction Timestamp.ceil at 0x1b56adee0>,
[2022-07-22T21:07:33.539Z] )

These tests failed with the same errors

[2022-07-22T21:10:32.039Z] =========================== short test summary info ============================
[2022-07-22T21:10:32.039Z] FAILED ../usr/local/lib/python3.9/site-packages/pandas/tests/scalar/timedelta/test_timedelta.py::TestTimedeltas::test_round_sanity[round]
[2022-07-22T21:10:32.039Z] FAILED ../usr/local/lib/python3.9/site-packages/pandas/tests/scalar/timestamp/test_unary_ops.py::TestTimestampUnaryOps::test_round_sanity[floor]
[2022-07-22T21:10:32.040Z] FAILED ../usr/local/lib/python3.9/site-packages/pandas/tests/scalar/timestamp/test_unary_ops.py::TestTimestampUnaryOps::test_round_sanity[ceil]

PR #1390 avoids this by pinning hypothesis python package.

vtikoo avatar Jul 25 '22 19:07 vtikoo

;tldr Overflow is detected for the falsifiying example above on native Linux as well.

pandas testsuite uses hypothesis, which provides decorators for property based testing.

Recently there has been a code change in hypothesis which generates larger boundary values.

This is causing some pandas tests to overflow.

Output on linux for equivalent code -

root@76e5635496f5:/# python3
Python 3.9.13 (main, Aug  2 2022, 11:20:39)
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> val = 9223286400000000001
>>>
>>> from pandas import Timedelta
>>> import numpy as np
>>> val2 = np.int64(val)
>>> val2
9223286400000000001
>>> def ceil_fun(val):
...     val2 = np.int64(val)
...     td = Timedelta(val2)
...     print(val)
...     res = Timedelta.ceil(td, "D")
...     print(res)
...
>>>
>>> ceil_fun(100)
100
1 days 00:00:00
>>> ceil_fun(9223286400000000001)
9223286400000000001
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in ceil_fun
  File "pandas/_libs/tslibs/timedeltas.pyx", line 1439, in pandas._libs.tslibs.timedeltas.Timedelta.ceil
  File "pandas/_libs/tslibs/timedeltas.pyx", line 1397, in pandas._libs.tslibs.timedeltas.Timedelta._round
  File "pandas/_libs/tslibs/fields.pyx", line 724, in pandas._libs.tslibs.fields.round_nsint64
  File "pandas/_libs/tslibs/fields.pyx", line 688, in pandas._libs.tslibs.fields._ceil_int64
OverflowError: Python int too large to convert to C long

vtikoo avatar Aug 04 '22 01:08 vtikoo