pandas
pandas copied to clipboard
Update pyarrow dependency
Pandas 1.4 currently requires pyarrow 1.0.1 (released August 2020). This issue is about discussing an update to the required pyarrow version, as suggested in #47781.
https://github.com/pandas-dev/pandas/pull/47781 implements a performance improvement that would require pyarrow 3.0 (released January 2021).
Pyarrow releases now move pretty fast, with new releases coming out approx. every 3 months that add major functionality. As such, efforts such as arrow-backed storage would probably also gain from regularly updating the pyarrow dependency in pandas (as has been done in previous versions).
I would be in favor of upgrading even to 4.0 (released April 26, 2021). We have had some CI issues related to pyarrow csv reading for versions 2 & 3
cc @jorisvandenbossche
Note that we already required a newer pyarrow version specifically for the StringDtype functionality in the past. You can see:
https://github.com/pandas-dev/pandas/blob/c8fc47ba714f9ac181905f4d9674b4a43772e833/pandas/core/arrays/string_arrow.py#L56-L59
(that was from a time we still supported older pyarrow versions. Given we now require 1.0.1 globally, this check is a bit obsolete)
Just to say that we could easily bump the required pyarrow version for the StringDtype, while still allowing pyarrow 1.0 for the Parquet IO.
Now, I am certainly not against increasing our minimum version. But as a data point, we noticed a month ago that based on PyPI download data, pyarrow 2.0 is still widely used .. (there is probably some often used package that has that pin) In general we notice that there are quite some downstream packages of pyarrow that lag behind in supporting the latest pyarrow versions. Of course, you can also say that people pinning to older pyarrow can also use an older pandas ..
For example for numpy the rule is all versions released in the 24 months prior to the project release (https://numpy.org/neps/nep-0029-deprecation_policy.html), if we would take the same rule for pyarrow that would mean we can drop pyarrow 1.0 now, but would still support pyarrow 2.0 (and we could start requiring pyarrow 3.0 in the release after 1.5 / starting from October)