tempo icon indicating copy to clipboard operation
tempo copied to clipboard

Potential performance issue: concat slow in pandas below 2.1 version

Open TendouArisu opened this issue 1 year ago • 2 comments

Issue Description:

Hello. I have discovered a performance degradation in the .concat function of pandas version 1.5.2. And I notice the repository depends on pandas 1.5.2 in python/requirements.txt. I am not sure whether this performance problem in pandas will affect this repository. I found some discussions on pandas GitHub related to this issue, including #50652 and #52685. I also found that python/tempo/intervals.py and python/tempo/tsdf.py used the influenced api. There may be more files using the influenced api.

Suggestion

I would recommend considering an upgrade to a different version of pandas >= 2.1 or exploring other solutions to optimize the performance of .concat. Any other workarounds or solutions would be greatly appreciated. Thank you!

TendouArisu avatar Mar 01 '24 07:03 TendouArisu

Thanks @TendouArisu for raising this issue. We try to map our dependencies to those of Databricks Runtimes so we're not able to update pandas everywhere within the project.

Have you encountered any performance issues when using the methods in tsdf and intervals that call .concat?

R7L208 avatar Mar 01 '24 16:03 R7L208

I haven't encountered obvious perf problems up to now. My issue is a potential perf problem and I think it probably influences the perf. I raise it because I encountered similar problems in other repositories related to pandas concat. If it is hard to update the dependencies, I think it won't cause a significant impact.

TendouArisu avatar Mar 01 '24 17:03 TendouArisu

Closing for now; @TendouArisu - we'll keep an eye on the performance of concat and are aware of the issue. Thanks for raising this!

  1. Dependencies are set to match what is available in Databricks Runtime, so it's difficult to upgrade an individual dependency.
  2. No obvious performance problems encountered up to now.

R7L208 avatar Jun 17 '24 18:06 R7L208