ianvs Deprecate outdated interfaces of pandas used in rank.py

Deprecate outdated interfaces of pandas used in rank.py

Open FuryMartin opened this issue 6 months ago • 4 comments

What would you like to be added/modified:

Some outdated interfaces of pandas are used in in rank.py .

We can deprecate them by:

Implement __get_all() using a more elegant interface.
using pd.concat() to merge old DataFrame and new DataFrame.

The modified method should remain compatible with the version of pandas corresponding to Python 3.6.

Why is this needed:

Ianvs was originally designed for pandas==1.1.5, but the latest version is now pandas==2.2.2.Due to a major version update, some interfaces of pandas have been deprecated in the new version.

Continuing to use these old interfaces will encounter errors on Python>=3.8.

pd.np has been deprecated in pandas>=2.0.0 : https://github.com/kubeedge/ianvs/blob/f2352ce018f04f398b1be0f37d0fa3cd11476626/core/storymanager/rank/rank.py#L208

AttributeError: module 'pandas' has no attribute 'np'

append has been deprecated in pandas>=2.0.0: https://github.com/kubeedge/ianvs/blob/f2352ce018f04f398b1be0f37d0fa3cd11476626/core/storymanager/rank/rank.py#L171

AttributeError: 'DataFrame' object has no attribute 'append'

initializing all_df with np.NAN will cause str data missing: https://github.com/kubeedge/ianvs/blob/f2352ce018f04f398b1be0f37d0fa3cd11476626/core/storymanager/rank/rank.py#L145

+------+-----------+-----+-----------+----------+-----------+----------+---------------------+----------------------+-------------------+------------------------+---------------------+------+-----+
| rank | algorithm | acc | edge-rate | paradigm | basemodel | apimodel | hard_example_mining | basemodel-model_name | basemodel-backend | basemodel-quantization | apimodel-model_name | time | url |
+------+-----------+-----+-----------+----------+-----------+----------+---------------------+----------------------+-------------------+------------------------+---------------------+------+-----+
|  1   |           | 0.6 |    0.6    |          |           |          |                     |                      |                   |                        |                     |      |     |
+------+-----------+-----+-----------+----------+-----------+----------+---------------------+----------------------+-------------------+------------------------+---------------------+------+-----+

Setting value by df.[row_index][column] will cause SettingWithCopyWarning. This interface will be deprecated in pandas>=3.0 in the future. https://github.com/kubeedge/ianvs/blob/f2352ce018f04f398b1be0f37d0fa3cd11476626/core/storymanager/rank/rank.py#L148

Line 151, 154, 158, 165, 167 have the same issue, too.

./ianvs/core/storymanager/rank/rank.py:148: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  all_df.loc[i][0] = algorithm.name

Assigning values one by one reduces code readability and can be simplified. https://github.com/kubeedge/ianvs/blob/f2352ce018f04f398b1be0f37d0fa3cd11476626/core/storymanager/rank/rank.py#L145-L167
Using whitspace as seperator in a CSV(Comma-Separated Values) file is weird. https://github.com/kubeedge/ianvs/blob/f2352ce018f04f398b1be0f37d0fa3cd11476626/core/storymanager/rank/rank.py#L179

Aug 16 '24 15:08 FuryMartin

ianvs ianvs copied to clipboard

Deprecate outdated interfaces of pandas used in rank.py

ianvs
ianvs copied to clipboard