eland
eland copied to clipboard
Pandas major version 2 support
Last April Pandas released version 2.0.0, which introduces many breaking changes. I have been submitting some pull requests here (#596 #595 #593 #592). These fix some minor things to prepare for supporting pandas>=2.0.0
. All the fixes until now do not immediately break pandas==1.5.0
support.
However, there are also some things issues that are a bit harder to upgrade to version 2, without perhaps breaking some of the previous functionality.
One such example is the fact that in aggregations such as groupby
, pandas
has ignored the sort
parameter for a long time. Tests that compare the column order between eland and pandas will fail for either pandas 1.5.0 or pandas 2.0.0.
Is the Eland project planning a major release when starting to support pandas 2? Or will it support pandas 2 by implementing different behaviour based on runtime checks of pandas' version?
Ideally we should support both versions as Pandas 1.x is still generally more popular than 2.x. Thanks for all the pull requests that are moving us in the right direction. We'll have to decide when we hit more thorny issues.
Pandas requires NumPy 1.22.4 minimum version. https://pandas.pydata.org/docs/dev/getting_started/install.html#dependencies
Because Shap is incompatible with NumPy >= 1.24 (#539) we will have to pin NumPy to the range numpy>=1.22.4,<1.24
when upgrading Pandas
Looks like Shap is in a better shape now :) https://github.com/shap/shap/pull/2943. We could probably remove the numpy pin when CI is fixed. I opened https://github.com/elastic/eland/pull/636 for this.