BUG: int Overflow with DateFormatter
Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import datetime as dt
from matplotlib import dates
import pandas as pd
dates.set_epoch("2023-01-01") # suggested workaround doesn't help
start = dt.datetime(2022, 4, 4, 7)
end = dt.datetime(2022, 4, 4, 16)
index = pd.date_range(start, end, freq="s")
df = pd.DataFrame(index=index)
df["col"] = 1
ax = df.plot()
ax.xaxis.set_major_formatter(dates.DateFormatter("%H:%M")) # breaks
Issue Description
See also: #18348
When setting a matplotlib.dates.DateFormatter, I get the following exception:
OverflowError: int too big to convert
Full Traceback:
runfile('/home/lukas/.config/spyder-py3/temp.py', wdir='/home/lukas/.config/spyder-py3')
[autoreload of pandas.core.arrays.timedeltas failed: Traceback (most recent call last):
File "/home/lukas/mambaforge/envs/moma/lib/python3.11/site-packages/IPython/extensions/autoreload.py", line 273, in check
superreload(m, reload, self.old_objects)
File "/home/lukas/mambaforge/envs/moma/lib/python3.11/site-packages/IPython/extensions/autoreload.py", line 471, in superreload
module = reload(module)
^^^^^^^^^^^^^^
File "/home/lukas/mambaforge/envs/moma/lib/python3.11/importlib/__init__.py", line 169, in reload
_bootstrap._exec(spec, module)
File "<frozen importlib._bootstrap>", line 621, in _exec
File "<frozen importlib._bootstrap_external>", line 940, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/home/lukas/mambaforge/envs/moma/lib/python3.11/site-packages/pandas/core/arrays/timedeltas.py", line 34, in <module>
from pandas._libs.tslibs.fields import (
ImportError: cannot import name 'get_timedelta_days' from 'pandas._libs.tslibs.fields' (/home/lukas/mambaforge/envs/moma/lib/python3.11/site-packages/pandas/_libs/tslibs/fields.cpython-311-x86_64-linux-gnu.so)
]
[autoreload of pandas._testing failed: Traceback (most recent call last):
File "/home/lukas/mambaforge/envs/moma/lib/python3.11/site-packages/IPython/extensions/autoreload.py", line 273, in check
superreload(m, reload, self.old_objects)
File "/home/lukas/mambaforge/envs/moma/lib/python3.11/site-packages/IPython/extensions/autoreload.py", line 471, in superreload
module = reload(module)
^^^^^^^^^^^^^^
File "/home/lukas/mambaforge/envs/moma/lib/python3.11/importlib/__init__.py", line 169, in reload
_bootstrap._exec(spec, module)
File "<frozen importlib._bootstrap>", line 621, in _exec
File "<frozen importlib._bootstrap_external>", line 940, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/home/lukas/mambaforge/envs/moma/lib/python3.11/site-packages/pandas/_testing/__init__.py", line 914, in <module>
cython_table = pd.core.common._cython_table.items()
^^^^^^^^^^^^^^
AttributeError: module 'pandas.core' has no attribute 'common'
]
Error in callback <function _draw_all_if_interactive at 0x7f4dc0f96020> (for post_execute):
Traceback (most recent call last):
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/pyplot.py:120 in _draw_all_if_interactive
draw_all()
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/_pylab_helpers.py:132 in draw_all
manager.canvas.draw_idle()
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/backend_bases.py:2082 in draw_idle
self.draw(*args, **kwargs)
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/backends/backend_agg.py:400 in draw
self.figure.draw(self.renderer)
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/artist.py:95 in draw_wrapper
result = draw(artist, renderer, *args, **kwargs)
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/artist.py:72 in draw_wrapper
return draw(artist, renderer)
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/figure.py:3140 in draw
mimage._draw_list_compositing_images(
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/image.py:131 in _draw_list_compositing_images
a.draw(renderer)
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/artist.py:72 in draw_wrapper
return draw(artist, renderer)
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/axes/_base.py:3064 in draw
mimage._draw_list_compositing_images(
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/image.py:131 in _draw_list_compositing_images
a.draw(renderer)
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/artist.py:72 in draw_wrapper
return draw(artist, renderer)
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/axis.py:1376 in draw
ticks_to_draw = self._update_ticks()
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/axis.py:1263 in _update_ticks
major_labels = self.major.formatter.format_ticks(major_locs)
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/ticker.py:218 in format_ticks
return [self(value, i) for i, value in enumerate(values)]
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/ticker.py:218 in <listcomp>
return [self(value, i) for i, value in enumerate(values)]
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/dates.py:651 in __call__
result = num2date(x, self.tz).strftime(self.fmt)
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/dates.py:544 in num2date
return _from_ordinalf_np_vectorized(x, tz).tolist()
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/numpy/lib/function_base.py:2329 in __call__
return self._vectorize_call(func=func, args=vargs)
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/numpy/lib/function_base.py:2412 in _vectorize_call
outputs = ufunc(*inputs)
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/dates.py:359 in _from_ordinalf
np.timedelta64(int(np.round(x * MUSECONDS_PER_DAY)), 'us'))
OverflowError: int too big to convert
Traceback (most recent call last):
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/IPython/core/formatters.py:340 in __call__
return printer(obj)
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/IPython/core/pylabtools.py:152 in print_figure
fig.canvas.print_figure(bytes_io, **kw)
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/backend_bases.py:2342 in print_figure
self.figure.draw(renderer)
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/artist.py:95 in draw_wrapper
result = draw(artist, renderer, *args, **kwargs)
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/artist.py:72 in draw_wrapper
return draw(artist, renderer)
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/figure.py:3140 in draw
mimage._draw_list_compositing_images(
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/image.py:131 in _draw_list_compositing_images
a.draw(renderer)
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/artist.py:72 in draw_wrapper
return draw(artist, renderer)
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/axes/_base.py:3064 in draw
mimage._draw_list_compositing_images(
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/image.py:131 in _draw_list_compositing_images
a.draw(renderer)
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/artist.py:72 in draw_wrapper
return draw(artist, renderer)
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/axis.py:1376 in draw
ticks_to_draw = self._update_ticks()
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/axis.py:1263 in _update_ticks
major_labels = self.major.formatter.format_ticks(major_locs)
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/ticker.py:218 in format_ticks
return [self(value, i) for i, value in enumerate(values)]
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/ticker.py:218 in <listcomp>
return [self(value, i) for i, value in enumerate(values)]
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/dates.py:651 in __call__
result = num2date(x, self.tz).strftime(self.fmt)
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/dates.py:544 in num2date
return _from_ordinalf_np_vectorized(x, tz).tolist()
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/numpy/lib/function_base.py:2329 in __call__
return self._vectorize_call(func=func, args=vargs)
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/numpy/lib/function_base.py:2412 in _vectorize_call
outputs = ufunc(*inputs)
File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/dates.py:359 in _from_ordinalf
np.timedelta64(int(np.round(x * MUSECONDS_PER_DAY)), 'us'))
OverflowError: int too big to convert
<Figure size 432x288 with 1 Axes>
Expected Behavior
No exceptions are raised, and the major ticks selected by the AutoDateLocator are formatted in hh:mm format.
Installed Versions
INSTALLED VERSIONS
commit : 37ea63d540fd27274cad6585082c91b1283f963d python : 3.11.3.final.0 python-bits : 64 OS : Linux OS-release : 5.4.0-135-generic Version : #152-Ubuntu SMP Wed Nov 23 20:19:22 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8
pandas : 2.0.1 numpy : 1.24.2 pytz : 2023.3 dateutil : 2.8.2 setuptools : 67.6.1 pip : 23.1 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : 8.12.0 pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.7.1 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2023.3 qtpy : None pyqt5 : None /home/lukas/mambaforge/envs/moma/lib/python3.11/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.")
Thanks @joooeey for the report
this actually works if you use matplotlib directly

This confirms that the issue's indeed on the pandas.plotting side
I suspect (but haven't checked) that somewhere in pandas.plotting, there's an assumption that pandas datetimes still all use nano-second resolution
Contributions would be welcome, this one might not be too tricky for newcomers
take
From my understanding so far, the issue is that matplotlib requires the number of days from the epoch (01-01-1970) for the datetime conversion, while pandas passes a float value of seconds elapsed since the epoch.
The missing piece is to understand where exactly this float value in seconds comes from in pandas. Possibly from TimeSeries_DateFormatter.set_locs() but need to investigate further to better understand, which I'll continue to do in the next days.
thanks for the investigation @PrimeF ! sounds good - take your time, no hurry
@MarcoGorelli: I've looked into the issue further and the Period values in seconds actually originate from: https://github.com/pandas-dev/pandas/blob/103d3b2bb8912c87f3ec11c1048e7e0a19225fef/pandas/plotting/_matplotlib/converter.py#L252-L255
In order to get the date in elapsed days since epoch as required by the operations performed in the DateFormatter (see https://github.com/matplotlib/matplotlib/blob/2e2d2d5f574ad43ba87fc893098345db5eb1eacc/lib/matplotlib/dates.py#L357) one could instead leverage this branch https://github.com/pandas-dev/pandas/blob/103d3b2bb8912c87f3ec11c1048e7e0a19225fef/pandas/plotting/_matplotlib/converter.py#L256-L257 and pass freq='D'.
However, this solutions seems to be highly "customised" to this issue, thus would require some specific conditions to avoid erroneously jumping in the branch and breaking other things down the line. I am actually not sure if this approach is the best one to follow, hence would appreciate any type of feedback.
thank for the investigation, I'll take a look
This bug has been discussed on the matplotlib side and there it sounded like being an issue with a different definition of the epoch. Maybe this helps. I'll also add a reference to this issue on the matplotlib issue.
I dug into this a bit since the PeriodConverter logic looked very strange to me (as an end-user), and it seems like the bulk of the PeriodConverter logic was written in 2012.
It seems to me possibly that PeriodConverter had only ever been intended to be used with freq="D". Otherwise, PeriodConverter._convert_1d() can return very different values depending on the frequency of the given periodic time series:
period = Period("2023-11-11 00:01.000", "T")
get_datevalue(period, "D") # 19672, convention that matplotlib uses e.g. matplotlib.dates.date2num(pd.Timestamp("2023-11-11 00:01.000"))
get_datevalue(period, "T") # 28327681
get_datevalue(period, "L") # 169947861900
So if a small time frequency like "T" (seconds) is given, at some point matplotlib will call something like num2date() on view limits around 28327681 to format the x-labels, at which point an OverflowError will be raised. Contrast with DatetimeConverter which carefully calls mdates.date2num(), and so will return values around 19672.
and pass freq='D'.
Unfortunately changing the frequency from "T" to "D" here would lose resolution, e.g. all of the timestamps in the above example would get cast to 19672. It seems to me that the most reasonable thing to do would be to just defer to DatetimeConverter and possibly remove PeriodConverter entirely, although I could be missing some context where PeriodConverter is actually useful.
Another frustrating issue that led me here is that the pandas plotting converters will try to auto-infer a periodic frequency, even if the index to be plotted is a non-periodic DatetimeIndex:
import pandas as pd
import matplotlib.dates as mdates
ts = pd.Timestamp("2023-11-11 00:00:00", tz="UTC")
ts_plus_n = lambda x: ts + pd.Timedelta("1S") * x # add x seconds
s = pd.Series([0, 1, 2, 3], index=[ts, ts_plus_n(1), ts_plus_n(2), ts_plus_n(99)])
# assert s.index.inferred_freq is None and s.iloc[0:3].index.inferred_freq == "S"
In [2]: s.iloc[0:4].plot().xaxis.set_major_formatter(mdates.DateFormatter("%X")) # plots as expected
In [3]: s.iloc[0:3].plot().xaxis.set_major_formatter(mdates.DateFormatter("%X")) # OverflowError!
It seems to me that the most reasonable thing to do would be to just defer to DatetimeConverter and possibly remove PeriodConverter entirely, although I could be missing some context where PeriodConverter is actually useful.
I welcome removing anything Period-related, fancy trying this out and making a PR?
Seems like there is some history to this and its been debated a few times: #7670 #9053 #15071 #18768 #26253
#15071 provides the best summary, the main benefit (or drawback, depending on one's viewpoint) of PeriodConverter is that some custom tick labelling can be done by pandas.
jorisvandenbossche: The question is what the reason is that we convert DatetimeIndex to periods for plotting. The reasons I can think of: Performance. Currently, the regular plotting is faster (so for a regular series
ts.plot()is faster asts.plot(x_compat=True)). However, I think this could be solved as most of the time is spent in converting the datetimes to floats (which should be vectorizable). Nicer tick label locations and formatting. This is a clear plus, our (convoluted) ticklocators and formatters give much nicer results as the default matplotlib (IMO)
Wes even replied at one point
I am not especially attached to it -- if you can unify / have a single code path for plotting without significantly changing functionality, sounds good to me.
IMHO the user should explicitly request the period-based formatting instead of pandas trying to automagically figure it out, but removing PeriodConverter will likely cause the default tick formatting for periodic time series to change, so it might still be a bit controversial.
I think the following could be done as first steps:
- No longer infer periodic frequency when plotting
DatetimeIndex. It seems to drive more confusion than help: for example the issues linked above and a few stackoverflow qs like https://stackoverflow.com/questions/21189954/. It's sufficient to remove this branch (can emit aFutureWarningif a softer transition is desired): https://github.com/pandas-dev/pandas/blob/157631d97840b7918eec4c8b40bd9c24b25771a7/pandas/plotting/_matplotlib/timeseries.py#L268 - In
pandas/plotting/_matplotlib/converter.py, we can changePeriodConverter._convert_1dhttps://github.com/pandas-dev/pandas/blob/157631d97840b7918eec4c8b40bd9c24b25771a7/pandas/plotting/_matplotlib/converter.py#L265 to callmdates.date2num()(for example by callingDatetimeConverter) so that all time series plots will be plotted with the x value units thatmatplotlibexpects. To soften the transition and allow the period tick formatting code inTimeSeries_Date{Locator,Formatter}to still work, the reverse mappinglambda x: Period(mdates.date2num(x), freq=self.freq).ordinalcan be applied inside their methods. -
TimeSeries_*can be renamed toPeriodSeries_*to make the intent more clear.
The pandas period tick formatter does seem a bit nicer than matplotlib's in some situations (although worse in others), so I'm more hesitant than before to try to remove the period-related code entirely.
I linked a few more directly related issues, and there's a few more that are linked to the issues in my comment above, so about double-digit issues on the tracker due to the same underlying issue here. Since pandas is changing the time unit when plotting, any other matplotlib feature involving x-coordinates (mdates.DateFormatter, format_coord, tooltips, axvspan or any other artist plotted with datetimes, shared x-axes, etc) may break. I've checked most of these issues are resolved by the fix in the previous comment.
For tracking, there's a few other classes of issues related to Period plotting:
- Performance / memory blowup: #41064 #47519 #53684. Roughly speaking matplotlib will call the locator a lot, and each time pandas full re-computes
_daily_finder(), which constructs a full date range with the GCD frequency that can result in a very large array. Caching can solve the first issue, the second issue maybe could be solved by downsampling the frequency. - Mis-aligned ticks: #29705 #29719 #45320 #46961. Have not looked into.
- Minor edge cases: #39705
Sorry if this is off-topic, but I'm posting this for those who may encounter the same issue. I'm using matplotlib and encountered this error. I was feeding unix epoch as seconds to the x axis, and then tried to use matplotlib.time to format the data, but I got the overflow error. It turns out that _from_ordinalf expects "days after epoch" instead of "seconds after epoch," so I tried dividing the timestamp by 24 * 60 * 60, then everything worked fine.
Didn't get a chance to try completing this since the previous comments, but here's a proof-of-concept with using DatetimeConverter instead of PeriodConverter for PeriodIndex: https://github.com/pandas-dev/pandas/compare/main...azhu-tower:pandas:period-converter An example with this commit is the plot in this comment: https://github.com/pandas-dev/pandas/issues/55110#issuecomment-1869499403. IIRC had profiled that the commit introduces some slowness likely due to calling date2num on each tick instead of across all ticks.
Anyone have a PR? I'm still encountering this issue with the latest version.
I am also encountering this issue with the latest version.