eland icon indicating copy to clipboard operation
eland copied to clipboard

Pandas major version 2 support

Open bartbroere opened this issue 1 year ago • 6 comments

Last April Pandas released version 2.0.0, which introduces many breaking changes. I have been submitting some pull requests here (#596 #595 #593 #592). These fix some minor things to prepare for supporting pandas>=2.0.0. All the fixes until now do not immediately break pandas==1.5.0 support.

However, there are also some things issues that are a bit harder to upgrade to version 2, without perhaps breaking some of the previous functionality.

One such example is the fact that in aggregations such as groupby, pandas has ignored the sort parameter for a long time. Tests that compare the column order between eland and pandas will fail for either pandas 1.5.0 or pandas 2.0.0.

Is the Eland project planning a major release when starting to support pandas 2? Or will it support pandas 2 by implementing different behaviour based on runtime checks of pandas' version?

bartbroere avatar Sep 14 '23 06:09 bartbroere

Ideally we should support both versions as Pandas 1.x is still generally more popular than 2.x. Thanks for all the pull requests that are moving us in the right direction. We'll have to decide when we hit more thorny issues.

pquentin avatar Sep 22 '23 13:09 pquentin

Pandas requires NumPy 1.22.4 minimum version. https://pandas.pydata.org/docs/dev/getting_started/install.html#dependencies

Because Shap is incompatible with NumPy >= 1.24 (#539) we will have to pin NumPy to the range numpy>=1.22.4,<1.24 when upgrading Pandas

davidkyle avatar Nov 16 '23 13:11 davidkyle

Looks like Shap is in a better shape now :) https://github.com/shap/shap/pull/2943. We could probably remove the numpy pin when CI is fixed. I opened https://github.com/elastic/eland/pull/636 for this.

pquentin avatar Nov 21 '23 08:11 pquentin