pandas
pandas copied to clipboard
BUG: Series.plot.hist segfault for timedelta64
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
import pandas as pd
t = pd.to_datetime(["2021-01-01", "2021-01-03", "2021-01-04"])
print(t)
dt = t.to_series().diff()
print(dt)
dt.plot.hist()
Issue Description
I get this crash from the dt.plot.hist()
call:
% python crash.py
DatetimeIndex(['2021-01-01', '2021-01-03', '2021-01-04'], dtype='datetime64[ns]', freq=None)
2021-01-01 NaT
2021-01-03 2 days
2021-01-04 1 days
dtype: timedelta64[ns]
zsh: segmentation fault python crash.py
Expected Behavior
No segfault. Give a plot or some informative exception.
Installed Versions
This is with Python 3.9 on MacOS installed via conda-forge.
% cat environment.yml name: cement
channels:
- conda-forge
- nodefaults
dependencies:
- python==3.9
- jupyterlab==3.1
- pandas==1.3
- scikit-learn==1.0
- missingno
- matplotlib
- seaborn==0.11
- dtale==1.56
- pandas-profiling
- sweetviz==2.1
- hcrystalball==0.1.10
- pip
- pip:
- flaml
pd.show_versions()
INSTALLED VERSIONS
commit : f00ed8f47020034e752baf0250483053340971b0 python : 3.9.0.final.0 python-bits : 64 OS : Darwin OS-release : 20.6.0 Version : Darwin Kernel Version 20.6.0: Mon Aug 30 06:12:21 PDT 2021; root:xnu-7195.141.6~3/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : None LOCALE : None.UTF-8
pandas : 1.3.0 numpy : 1.21.2 pytz : 2021.1 dateutil : 2.8.2 pip : 21.2.4 setuptools : 58.0.4 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.0.1 IPython : 7.28.0 pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.4.3 numexpr : None odfpy : None openpyxl : 3.0.9 pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.7.1 sqlalchemy : None tables : None tabulate : None xarray : 0.19.0 xlrd : 2.0.1 xlwt : None numba : 0.53.1
Thanks for reporting this @cdeil! Not sure if there are issues the pandas
logic somewhere as well, but I can reproduce the segfault with only numpy
with
td_arr = np.array([], dtype="timedelta64[ns]")
np.result_type(0, td_arr)
Issue fixed upstream (very quickly!) - we should probably add a test, but need to wait until 1.21.3 released.
Issue fixed upstream (very quickly!) - we should probably add a test, but need to wait until 1.21.3 released.
just needs a test to close
@simonjayhawkins unfortunatly, the provided example does not work.
>>> import pandas as pd
>>> t = pd.to_datetime(["2021-01-01", "2021-01-03", "2021-01-04"])
>>> dt = t.to_series().diff()
>>> print(dt)
2021-01-01 NaT
2021-01-03 2 days
2021-01-04 1 days
dtype: timedelta64[ns]
>>> dt.plot.hist()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pandas/plotting/_core.py", line 1375, in hist
return self(kind="hist", by=by, bins=bins, **kwargs)
File "pandas/plotting/_core.py", line 1001, in __call__
return plot_backend.plot(data, kind=kind, **kwargs)
File "pandas/plotting/_matplotlib/__init__.py", line 71, in plot
plot_obj.generate()
File "pandas/plotting/_matplotlib/core.py", line 448, in generate
self._args_adjust()
File "pandas/plotting/_matplotlib/hist.py", line 74, in _args_adjust
self.bins = self._calculate_bins(self.data)
File "pandas/plotting/_matplotlib/hist.py", line 85, in _calculate_bins
hist, bins = np.histogram(
File "<__array_function__ internals>", line 180, in histogram
File "/numpy/lib/histograms.py", line 793, in histogram
bin_edges, uniform_bins = _get_bin_edges(a, bins, range, weights)
File "numpy/lib/histograms.py", line 443, in _get_bin_edges
bin_type = np.result_type(bin_type, float)
File "<__array_function__ internals>", line 180, in result_type
TypeError: The DType <class 'numpy.dtype[timedelta64]'> could not be promoted by <class 'numpy.dtype[float64]'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is `object`. The full list of DTypes is: (<class 'numpy.dtype[timedelta64]'>, <class 'numpy.dtype[float64]'>)
Does this work as intended? Or should a separate issue be created?