rubyXL
rubyXL copied to clipboard
Improve Performance
It takes forever to read the file and modify the file.
Yes, the gem is very useful, but the performance is terrible
i have similar problems with big files containing 10-20000+ rows. To me it seems the whole structure is read into memory, opposed to lazy loading a sheets rows? I have not looked at the internals yet, but maybe there is a chance to also read(stream) the rows in chunks, to save memory footprint. Besides xml parsing is slow and there are a lot of structures to take care of .. i hate excel but this gem took some of my pains
Just found this is a duplicate of #199
The problem is definitely not about memory, the problem take place in the equals ==
operator of OOXmlObject
, cf my profiling:
11.40 340.639 41.469 0.032 299.137 47407460 *RubyXL::OOXMLObjectInstanceMethods#==
11.24 310.112 40.896 0.074 269.142 58447965 *Hash#each
10.38 93.595 37.767 0.338 55.490 58507863 RubyXL::OOXMLObjectInstanceMethods#obtain_class_variable
7.15 45.605 26.026 0.000 19.579 58507863 RubyXL::OOXMLObjectClassMethods#obtain_class_variable
6.04 211.185 21.987 0.003 189.195 58435792 *Enumerable#all?
5.38 19.579 19.579 0.000 0.000 58507863 Module#class_variable_get
5.37 359.224 19.531 0.022 339.671 43503 Array#find_index
5.14 18.693 18.693 0.000 0.000 105922309 Kernel#class
2.48 9.026 9.026 0.000 0.000 47545191 Kernel#is_a?
0.68 2.468 2.458 0.000 0.010 10704135 String#==
A naive optimization (but still a bit buggy) give me good performance improvement, from 2000 seconds to ... 30 seconds 👍 I think I can go up to 100x improvement.
Meanwhile, using no fonts/styling/filling colors will keep the gems run smoothly.
The code seems easy to optimize. I've isolated the issue and I'll release a pull request soon.
Awesome finding! Looking foreward to it.
@anykeyh: can you show me your test case? I just attempted some testing of my own and I wasn't able to see the same results; in fact, RubyXL::OOXMLObjectInstanceMethods#==
is never called to begin with.
Yes, I'll provide a test case, but for me it's called from:
"rubyXL/cell.rb:37:in `validate_worksheet'"=>3490,
"rubyXL/convenience_methods.rb:129:in `block in register_new_fill'"=>452,
"rubyXL/convenience_methods.rb:145:in `block in register_new_xf'"=>51557,
"rubyXL/convenience_methods.rb:137:in `block in register_new_font'"=>4330,
"rubyXL/convenience_methods.rb:173:in `block in modify_border'"=>1809
As I said above, in my case it's cell styling which is completely dropping the performance.
Just a note: It's note related to read and modify, but to write output. I took this issue as overall performance issue to discuss about the perf problems I've encountered and the solutions I'll try to provide. Sorry if it wasn't clear ;-)