creek icon indicating copy to clipboard operation
creek copied to clipboard

Large Memory and Speed performance regression from 2.5.3 to 2.6.3

Open azrazalea-debtbook opened this issue 4 months ago • 6 comments

We upgraded to rails 7.1.3 and at the same time I upgraded our version of creek from 2.5.3 to 2.6.3.

After deploying the new rails version, we noticed our memory usage going up significantly as well as our time to process spreadsheets.

~~I tested the two versions in irb isolation and saw no difference, with a slight improvement for 2.6.3~~. However, when in docker the difference is massive.

(See https://github.com/pythonicrubyist/creek/issues/122#issuecomment-1957764405, my testing methodology wasn't great for local Mac and the difference is less but still large. This is not just docker)

[3] [app][development] pry(main)> Creek::VERSION
=> "2.5.3"
[4] [app][development] pry(main)> time = Time.now; creek = Creek::Book.new('performance-test.xlsx'); puts(Time.now - time);
8.105815712
[5] [app][development] pry(main)> time = Time.now; creek.sheets[0].simple_rows.map(&:inspect); puts(Time.now - time);
38.974233879
[6] [app][development] pry(main)>
VmPeak:	  709908 kB
VmSize:	  593128 kB
VmLck:	       0 kB
VmPin:	       0 kB
VmHWM:	  616644 kB
VmRSS:	  518568 kB
VmData:	  559256 kB
VmStk:	    8188 kB
VmExe:	       4 kB
VmLib:	   24516 kB
VmPTE:	    1372 kB
VmSwap:	       0 kB
[1] [app][development] pry(main)> Creek::VERSION
=> "2.6.3"
[2] [app][development] pry(main)> time = Time.now; creek = Creek::Book.new('performance-test.xlsx'); puts(Time.now - time);
8.186497337
[3] [app][development] pry(main)> time = Time.now; creek.sheets[0].simple_rows.map(&:inspect); puts(Time.now - time);
66.137462044
[4] [app][development] pry(main)>
VmPeak:	 3852528 kB
VmSize:	  603644 kB
VmLck:	       0 kB
VmPin:	       0 kB
VmHWM:	 3765544 kB
VmRSS:	  529308 kB
VmData:	  569772 kB
VmStk:	    8188 kB
VmExe:	       4 kB
VmLib:	   24516 kB
VmPTE:	    1520 kB
VmSwap:	       0 kB

As you can see in the above, the problem appears to be the row generator. Also, the peak memory usage went from 709,000 KB to 3,852,000 KB, 5 times more memory!

Here is the spreadsheet I used for testing (a subset of chicago's public crime data) https://drive.usercontent.google.com/download?id=1zWiCRYCS7Vs9EPZyOqcKr-eT59F7w3dm&export=download (give it a minute to start downloading, google has to virus scan it first and it takes forever)

azrazalea-debtbook avatar Feb 21 '24 03:02 azrazalea-debtbook