opensearch-py-ml icon indicating copy to clipboard operation
opensearch-py-ml copied to clipboard

[Enhancement] Bumping pandas version from 1.5.0 to 2.x

Open thanawan-atc opened this issue 10 months ago • 15 comments

To bump pandas version from 1.5.0 to 2.x, we have to make changes to resolve issues such as

  • AttributeError: 'DataFrameGroupBy' object has no attribute 'mad' -> mad() is deprecated after 1.5.0 (https://pandas.pydata.org/pandas-docs/version/1.5/reference/api/pandas.DataFrame.mad.html)(https://github.com/pandas-dev/pandas/issues/11787)
  • AttributeError("'DataFrame' object has no attribute '_construct_axes_from_arguments'"
  • TypeError: quantile() got an unexpected keyword argument 'numeric_only'
  • TypeError: to_csv() got an unexpected keyword argument 'line_terminator'

To see the full error on GitHub Actions,

  1. Change pandas version for integration test to ~2.0.1 (here and here) as below export PANDAS_VERSION=${PANDAS_VERSION-2.0.1} @nox.parametrize("pandas_version", ["2.0.1"]) Note that we can bump it to be 2.0.3 or above as well.
  2. Integration test should be run automatically when you push any changes to the branch. You can see the log by going to Actions tab > Click Integration tests on the right menu > Choose the test that is running on the branch with your changes

See example of GitHub Actions log here: https://github.com/thanawan-atc/opensearch-py-ml/actions/runs/6040143974/job/16390355081

We have to change those deprecated functions so that it works with pandas 2.x.

Once there is no error in integration test, we can then update requirements.txt , requirements-dev.txt , requirements-docs.txt , ci file, and noxfile to use new pandas version.

Lastly, make sure that the integration workflow and build-deploy-doc workflow do not fail with the new pandas version.

thanawan-atc avatar Aug 31 '23 18:08 thanawan-atc

@thanawan-atc should we not go to 2.0.3 since it's the latest in that line?

dtaivpp avatar Sep 01 '23 18:09 dtaivpp

Also, do you think we should change the requirements.txt to be <2.0 as 2.1 is currently broken and is not getting added to testing?

dtaivpp avatar Sep 01 '23 18:09 dtaivpp

I believe 2.0.3 also made the integration test failed as well. Our plan is to update pandas-related functions that were deprecated and then bump pandas from 1.5.0 to 2.x.

thanawan-atc avatar Sep 06 '23 21:09 thanawan-atc

@Yerzhaisang you can pick up this task. Please let me know if you have any question regarding this.

dhrubo-os avatar Sep 06 '23 23:09 dhrubo-os

got it

Yerzhaisang avatar Sep 08 '23 06:09 Yerzhaisang

Can I work on this issue?

wuzhijing0127 avatar Oct 02 '23 17:10 wuzhijing0127

Can I work on this issue?

If I can't resolve this issue on this weekend, you can take it

Yerzhaisang avatar Oct 02 '23 17:10 Yerzhaisang

May I try on this issue?

Sylviama1026 avatar Oct 12 '23 16:10 Sylviama1026

Sure, assigning it to you.

dhrubo-os avatar Oct 12 '23 16:10 dhrubo-os

The upgrade to 2.x should also resolve #263

miguelsousa avatar Oct 12 '23 22:10 miguelsousa

Dear @dhrubo-os , can you please update this issue card?

Yerzhaisang avatar Dec 04 '23 05:12 Yerzhaisang

If this issue is still actual and there is no updates, reassign it to me please

Yerzhaisang avatar Dec 06 '23 20:12 Yerzhaisang

Hey @Yerzhaisang any updates? I see you created a PR last month and I can confirm it works (I ran pip install git+https://github.com/Yerzhaisang/opensearch-py-ml.git@dev) so what needs to happen in order to get this merged in?

soapergem avatar Feb 08 '24 14:02 soapergem

@soapergem Thank you for the comment.

Dear @dhrubo-os, can I start fix from scratch and raise another PR. If yes, I will close https://github.com/opensearch-project/opensearch-py-ml/pull/366 and raise another PR. Thank you!

Yerzhaisang avatar Feb 10 '24 17:02 Yerzhaisang

Sure, go ahead.

dhrubo-os avatar Feb 10 '24 23:02 dhrubo-os