tempo
tempo copied to clipboard
Potential performance issue: concat slow in pandas below 2.1 version
Issue Description:
Hello.
I have discovered a performance degradation in the .concat
function of pandas version 1.5.2. And I notice the repository depends on pandas 1.5.2 in python/requirements.txt
. I am not sure whether this performance problem in pandas will affect this repository. I found some discussions on pandas GitHub related to this issue, including #50652 and #52685.
I also found that python/tempo/intervals.py
and python/tempo/tsdf.py
used the influenced api. There may be more files using the influenced api.
Suggestion
I would recommend considering an upgrade to a different version of pandas >= 2.1 or exploring other solutions to optimize the performance of .concat
.
Any other workarounds or solutions would be greatly appreciated.
Thank you!
Thanks @TendouArisu for raising this issue. We try to map our dependencies to those of Databricks Runtimes so we're not able to update pandas everywhere within the project.
Have you encountered any performance issues when using the methods in tsdf
and intervals
that call .concat
?
I haven't encountered obvious perf problems up to now. My issue is a potential perf problem and I think it probably influences the perf. I raise it because I encountered similar problems in other repositories related to pandas concat
. If it is hard to update the dependencies, I think it won't cause a significant impact.
Closing for now; @TendouArisu - we'll keep an eye on the performance of concat
and are aware of the issue. Thanks for raising this!
- Dependencies are set to match what is available in Databricks Runtime, so it's difficult to upgrade an individual dependency.
- No obvious performance problems encountered up to now.