danfojs icon indicating copy to clipboard operation
danfojs copied to clipboard

merge has exponential complexity crashing the runtime

Open kozmaz87 opened this issue 1 year ago • 1 comments

Describe the bug Basically take 4 DataFrames each with 200 rows and 3 columns and start merging them in a loop and by the 3rd iteration the memory footprint balloons to over 4GB. At that point many runtimes kill the application including the Excel one where I was attempting to do this.

To Reproduce Steps to reproduce the behavior:

  1. Create 4 DataFrames with 200 rows each, and 3 columns with a datetime index.
  2. Merge on the datetime column with how: 'outer'
  3. The memory footprint increases exponentially loop by loop.

Expected behavior It should not crash from a 200 record merge and the footprint should be linear.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS] Windows
  • Browser [e.g. chrome, safari] Excel runtime(Edge)
  • Version [e.g. 22] No idea... internal to Excel

Additional context The reason I tried to make this work is to avoid having to write a merge sort manually. :( JS is not my strong suite.

kozmaz87 avatar Mar 05 '24 17:03 kozmaz87

@kozmaz87 Thanks for raising this. We need to take a second look at the way we handle merge. I think it can be optimized, and memory usage should not be that high.

risenW avatar Apr 02 '25 16:04 risenW