rubyXL icon indicating copy to clipboard operation
rubyXL copied to clipboard

Improve Performance

Open nmenag opened this issue 8 years ago • 6 comments

It takes forever to read the file and modify the file.

nmenag avatar Sep 24 '16 17:09 nmenag

Yes, the gem is very useful, but the performance is terrible

victorlcampos avatar Oct 20 '16 19:10 victorlcampos

i have similar problems with big files containing 10-20000+ rows. To me it seems the whole structure is read into memory, opposed to lazy loading a sheets rows? I have not looked at the internals yet, but maybe there is a chance to also read(stream) the rows in chunks, to save memory footprint. Besides xml parsing is slow and there are a lot of structures to take care of .. i hate excel but this gem took some of my pains

Just found this is a duplicate of #199

schorsch avatar Mar 03 '17 09:03 schorsch

The problem is definitely not about memory, the problem take place in the equals == operator of OOXmlObject, cf my profiling:

 11.40    340.639    41.469     0.032   299.137 47407460  *RubyXL::OOXMLObjectInstanceMethods#==
 11.24    310.112    40.896     0.074   269.142 58447965  *Hash#each
 10.38     93.595    37.767     0.338    55.490 58507863   RubyXL::OOXMLObjectInstanceMethods#obtain_class_variable
  7.15     45.605    26.026     0.000    19.579 58507863   RubyXL::OOXMLObjectClassMethods#obtain_class_variable
  6.04    211.185    21.987     0.003   189.195 58435792  *Enumerable#all?
  5.38     19.579    19.579     0.000     0.000 58507863   Module#class_variable_get
  5.37    359.224    19.531     0.022   339.671    43503   Array#find_index
  5.14     18.693    18.693     0.000     0.000 105922309   Kernel#class
  2.48      9.026     9.026     0.000     0.000 47545191   Kernel#is_a?
  0.68      2.468     2.458     0.000     0.010 10704135   String#==

A naive optimization (but still a bit buggy) give me good performance improvement, from 2000 seconds to ... 30 seconds 👍 I think I can go up to 100x improvement.

Meanwhile, using no fonts/styling/filling colors will keep the gems run smoothly.

The code seems easy to optimize. I've isolated the issue and I'll release a pull request soon.

anykeyh avatar Jun 23 '17 14:06 anykeyh

Awesome finding! Looking foreward to it.

schorsch avatar Jun 23 '17 20:06 schorsch

@anykeyh: can you show me your test case? I just attempted some testing of my own and I wasn't able to see the same results; in fact, RubyXL::OOXMLObjectInstanceMethods#== is never called to begin with.

weshatheleopard avatar Jun 23 '17 23:06 weshatheleopard

Yes, I'll provide a test case, but for me it's called from:

"rubyXL/cell.rb:37:in `validate_worksheet'"=>3490,
"rubyXL/convenience_methods.rb:129:in `block in register_new_fill'"=>452,
"rubyXL/convenience_methods.rb:145:in `block in register_new_xf'"=>51557,
"rubyXL/convenience_methods.rb:137:in `block in register_new_font'"=>4330,
"rubyXL/convenience_methods.rb:173:in `block in modify_border'"=>1809

As I said above, in my case it's cell styling which is completely dropping the performance.

Just a note: It's note related to read and modify, but to write output. I took this issue as overall performance issue to discuss about the perf problems I've encountered and the solutions I'll try to provide. Sorry if it wasn't clear ;-)

anykeyh avatar Jun 24 '17 05:06 anykeyh