pandas icon indicating copy to clipboard operation
pandas copied to clipboard

BUG: Series.plot.hist segfault for timedelta64

Open cdeil opened this issue 3 years ago • 3 comments

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd

t = pd.to_datetime(["2021-01-01", "2021-01-03", "2021-01-04"])
print(t)

dt = t.to_series().diff()
print(dt)

dt.plot.hist()

Issue Description

I get this crash from the dt.plot.hist() call:

% python crash.py
DatetimeIndex(['2021-01-01', '2021-01-03', '2021-01-04'], dtype='datetime64[ns]', freq=None)
2021-01-01      NaT
2021-01-03   2 days
2021-01-04   1 days
dtype: timedelta64[ns]
zsh: segmentation fault  python crash.py

Expected Behavior

No segfault. Give a plot or some informative exception.

Installed Versions

This is with Python 3.9 on MacOS installed via conda-forge.

% cat environment.yml name: cement

channels:

  • conda-forge
  • nodefaults

dependencies:

  • python==3.9
  • jupyterlab==3.1
  • pandas==1.3
  • scikit-learn==1.0
  • missingno
  • matplotlib
  • seaborn==0.11
  • dtale==1.56
  • pandas-profiling
  • sweetviz==2.1
  • hcrystalball==0.1.10
  • pip
  • pip:
    • flaml

pd.show_versions()

INSTALLED VERSIONS

commit : f00ed8f47020034e752baf0250483053340971b0 python : 3.9.0.final.0 python-bits : 64 OS : Darwin OS-release : 20.6.0 Version : Darwin Kernel Version 20.6.0: Mon Aug 30 06:12:21 PDT 2021; root:xnu-7195.141.6~3/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : None LOCALE : None.UTF-8

pandas : 1.3.0 numpy : 1.21.2 pytz : 2021.1 dateutil : 2.8.2 pip : 21.2.4 setuptools : 58.0.4 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.0.1 IPython : 7.28.0 pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.4.3 numexpr : None odfpy : None openpyxl : 3.0.9 pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.7.1 sqlalchemy : None tables : None tabulate : None xarray : 0.19.0 xlrd : 2.0.1 xlwt : None numba : 0.53.1

cdeil avatar Oct 09 '21 11:10 cdeil

Thanks for reporting this @cdeil! Not sure if there are issues the pandas logic somewhere as well, but I can reproduce the segfault with only numpy with

td_arr = np.array([], dtype="timedelta64[ns]")
np.result_type(0, td_arr)

mzeitlin11 avatar Oct 09 '21 15:10 mzeitlin11

Issue fixed upstream (very quickly!) - we should probably add a test, but need to wait until 1.21.3 released.

mzeitlin11 avatar Oct 11 '21 19:10 mzeitlin11

Issue fixed upstream (very quickly!) - we should probably add a test, but need to wait until 1.21.3 released.

just needs a test to close

simonjayhawkins avatar Aug 05 '22 18:08 simonjayhawkins

@simonjayhawkins unfortunatly, the provided example does not work.

>>> import pandas as pd
>>> t = pd.to_datetime(["2021-01-01", "2021-01-03", "2021-01-04"])
>>> dt = t.to_series().diff()
>>> print(dt)
2021-01-01      NaT
2021-01-03   2 days
2021-01-04   1 days
dtype: timedelta64[ns]
>>> dt.plot.hist()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/plotting/_core.py", line 1375, in hist
    return self(kind="hist", by=by, bins=bins, **kwargs)
  File "pandas/plotting/_core.py", line 1001, in __call__
    return plot_backend.plot(data, kind=kind, **kwargs)
  File "pandas/plotting/_matplotlib/__init__.py", line 71, in plot
    plot_obj.generate()
  File "pandas/plotting/_matplotlib/core.py", line 448, in generate
    self._args_adjust()
  File "pandas/plotting/_matplotlib/hist.py", line 74, in _args_adjust
    self.bins = self._calculate_bins(self.data)
  File "pandas/plotting/_matplotlib/hist.py", line 85, in _calculate_bins
    hist, bins = np.histogram(
  File "<__array_function__ internals>", line 180, in histogram
  File "/numpy/lib/histograms.py", line 793, in histogram
    bin_edges, uniform_bins = _get_bin_edges(a, bins, range, weights)
  File "numpy/lib/histograms.py", line 443, in _get_bin_edges
    bin_type = np.result_type(bin_type, float)
  File "<__array_function__ internals>", line 180, in result_type
TypeError: The DType <class 'numpy.dtype[timedelta64]'> could not be promoted by <class 'numpy.dtype[float64]'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is `object`. The full list of DTypes is: (<class 'numpy.dtype[timedelta64]'>, <class 'numpy.dtype[float64]'>)

Does this work as intended? Or should a separate issue be created?

DriesSchaumont avatar Sep 04 '22 20:09 DriesSchaumont