skrub icon indicating copy to clipboard operation
skrub copied to clipboard

Potential performance issue: .to_dict method slow in pandas below 2.2

Open TendouArisu opened this issue 1 year ago • 0 comments

Problem Description

Hello. I have discovered a performance degradation in the .to_dict function of pandas version 1.5.3. And I noticed that some parts of the repository depend on the pandas version 1.5.3. I found that many files such as skrub/_table_vectorizer.py used the influenced api. There may be more files using the influenced api. I am not sure whether this performance problem in pandas will affect this repository. Here are some discussions on pandas GitHub related to this issue, including #50990 and #54824.

Feature Description

I would recommend considering an upgrade to a different version of pandas >= 2.2 or exploring other solutions to optimize the performance. Any other workarounds or solutions would be greatly appreciated. Thank you!

Alternative Solutions

No response

Additional Context

No response

TendouArisu avatar Feb 29 '24 18:02 TendouArisu