lit icon indicating copy to clipboard operation
lit copied to clipboard

Potential performance issue: .to_dict method slow in pandas below 2.2

Open TendouArisu opened this issue 1 year ago • 1 comments

Issue Description:

Hello. I have discovered a performance degradation in the .to_dict function of pandas version 1.5.3. And I noticed that some parts of the repository depend on the pandas version 1.5.3. I found that many files such as lit_nlp/examples/datasets/glue.py used the influenced api. There may be more files using the influenced api. I am not sure whether this performance problem in pandas will affect this repository. Here are some discussions on pandas GitHub related to this issue, including #50990 and #54824.

Suggestion

I would recommend considering an upgrade to a different version of pandas >= 2.2 or exploring other solutions to optimize the performance. Any other workarounds or solutions would be greatly appreciated. Thank you!

TendouArisu avatar Feb 29 '24 17:02 TendouArisu

Thanks for the report! There are a few (significant) version bumps in the works for LIT and I'll add this to the list. Will keep you updated on progress as best I can.

RyanMullins avatar Feb 29 '24 17:02 RyanMullins