FATE icon indicating copy to clipboard operation
FATE copied to clipboard

Potential performance issue: Unreliable performance of .loc in pandas 2.0.3

Open TendouArisu opened this issue 1 year ago • 1 comments
trafficstars

Issue Description:

Hello. I have discovered a performance degradation in the .loc function of pandas version 2.0.3 when .loc handling big DataFrame with non-unique indexes. When using pandas more than 4 indexes, .loc drastically increases to X1000 times. And I notice in python/requirements-fate.txt, shows that it depends on pandas version 2.0.3. I am not sure whether this performance problem in pandas will affect this repository. I found some discussions on GitHub related to this issue, including #54550 and #54746. I also found that python/fate/ml/feature_selection/hetero_feature_selection.py and python/fate/ml/statistics/statistics.py both used the influenced api. There may be more files used the influenced api.

Suggestion

I would recommend considering an upgrade to a different version of pandas >= 2.1 or exploring other solutions to optimize the performance of .loc . Any other workarounds or solutions would be greatly appreciated. Thank you!

TendouArisu avatar Feb 29 '24 08:02 TendouArisu

Very good suggestion and finding! We will upgrade pandas to newer version in later version of fate, thanks

mgqa34 avatar Mar 12 '24 03:03 mgqa34