pandas
pandas copied to clipboard
BUG: pandas.to_datetime fails to handle numpy.nan on riscv64 due to dependency on undefined behaviour
Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas
import numpy
print(pandas.to_datetime(numpy.nan, unit="s"))
Issue Description
Converting a value of floating type to integer type which is out of range for the integer type is undefined, see 6.3.1.4 Real floating and integer.
You can use gcc92.fsffrance.org for your tests if don't have your own hardware, or use qemu with the images from https://download.opensuse.org/ports/riscv/tumbleweed/images/.
$ python3 -c 'import pandas
import numpy
print(pandas.to_datetime(numpy.nan, unit="s"))'
Traceback (most recent call last):
File "<string>", line 3, in <module>
File "/usr/lib64/python3.10/site-packages/pandas/core/tools/datetimes.py", line 1078, in to_datetime
result = convert_listlike(np.array([arg]), format)[0]
File "/usr/lib64/python3.10/site-packages/pandas/core/tools/datetimes.py", line 357, in _convert_listlike_datetimes
return _to_datetime_with_unit(arg, unit, name, tz, errors)
File "/usr/lib64/python3.10/site-packages/pandas/core/tools/datetimes.py", line 530, in _to_datetime_with_unit
arr, tz_parsed = tslib.array_with_unit_to_datetime(arg, unit, errors=errors)
File "pandas/_libs/tslib.pyx", line 266, in pandas._libs.tslib.array_with_unit_to_datetime
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: cannot convert input with unit 's'
Expected Behavior
No error.
Installed Versions
/usr/lib/python3.10/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.")
INSTALLED VERSIONS
commit : e8093ba372f9adfe79439d90fe74b0b5b6dea9d6 python : 3.10.6.final.0 python-bits : 64 OS : Linux OS-release : 6.0.0-rc5-38-default Version : #1 SMP Mon Sep 12 15:18:20 UTC 2022 (005845a) machine : riscv64 processor : riscv64 byteorder : little LC_ALL : None LANG : de_DE.UTF-8 LOCALE : de_DE.UTF-8
pandas : 1.4.3 numpy : 1.21.6 pytz : 2022.1 dateutil : 2.8.2 setuptools : 63.2.0 pip : 22.0.4 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : 1.3.5 brotli : 1.0.9 fastparquet : None fsspec : None gcsfs : None markupsafe : None matplotlib : None numba : None numexpr : 2.8.3 odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None
Hi, thanks for your report. This works on 1.4.3 and main for me. Can you please recheck, that you have pandas 1.4.3 installed?
Of course I have, why do you ask?
Because it works for me on 1.4.3, 1.4.4 and main
How did you test that?
I executed the code snippet from your post?
Where did you execute it? In qemu or on real hardware?
Macos
I executed it on a macOS os
What hardware?
There is macos for RISC-V???
Did you actually read the report?
Please use gcc92.fsffrance.org for your tests if don't have your own hardware.
There is a code snippet, nothing else. I read the versions, but if you think that this is specific to your hardware, a small explanation would have been nice. Your report reads like a general issue, which is not the case.
could you adjust your issue title and add a small explanation?
you could also try debugging it yourself if you are interested
You can also use one of the images in https://download.opensuse.org/ports/riscv/tumbleweed/images/ with qemu.
This is a general issue because it depends on undefined behaviour (converting NaN value to integer).
It works on my machine, so seems to be hardware dependent
Depending on undefined behaviour is a bug.
$ python3 -c $'import numpy\nprint(numpy.asarray(numpy.nan).astype("i8"))' 9223372036854775807
@andreas-schwab we don't support this hardware in any way
you can submit a patch if u can find the problem
This has nothing to do with hardware support. This is undefined behaviour. Depending on undefined behaviour is a serious bug.
You are welcome to submit a pr, if you can identify the bug and provide a fix.
@andreas-schwab could you please clarify? What commits is this diff between, what's it meant to show? I've formatted your code to make it easier to read, but - apologies for not understanding - I still don't see your point. Could you clarify what exactly you're expecting pandas to do?
Did you actually read the report?
please be respectful
Your reaction to the bug report has been far from respectful so far.
Could you please update the top post with steps how to reproduce the bug if you are on windows/ubuntu/macOS? This will help someone who wants to work on this. We have tests covering this case in the ci, so simply executing the code snippet won't be sufficient.
Additionally, it would be great if you could add an explanation on what you are referring to with undefined behaviour, this is not clear to me. Some context to the code snippet you posted earlier would also be helpful.
Converting a value of floating type to integer type which is out of range for the integer type is undefined, see 6.3.1.4 Real floating and integer.
You can use gcc92.fsffrance.org for your tests if don't have your own hardware, or use qemu with the images from https://download.opensuse.org/ports/riscv/tumbleweed/images/.
I copied it into the top post in case someone wants to work on this