wikicloth
wikicloth copied to clipboard
Extreme slowness with a large page
I'm using wikicloth as part of Gollum and a page is taking a really long time to generate, as in, 16+ seconds: https://github.com/gollum/gollum/issues/680
The short version of that bug report is that there appears to be some exponential algorithm used, such that page generate times are fine (~2 sec) for half the page, but then increase sharply to the full page, where they are 16+ seconds.
I have wikicloth installed locally from a gem, but I don't know enough ruby to make a simple file to use wikicloth to convert a file, or to profile it, which would be the logical next step to fix this.
Do you have any suggestions? The wikicloth output looks great, but it's too slow to be usable. Thanks!
With the help of a Gollum developer I was able to profile this: https://github.com/gollum/gollum/issues/680
It should be pretty easy to replicate. Copy text from http://www.openflow.org/wk/index.php?title=OpenFlow_Tutorial&action=edit into a file.
Run this script:
# gem install perftools.rb wikicloth
require 'perftools'
require 'wikicloth'
ARGV.each do|input|
puts "File: #{input}"
PerfTools::CpuProfiler.start("/tmp/profile-#{input}") do
WikiCloth::Parser.new({ :data => File.read(input) }).to_html
end
end
Run the profile script with the file as input, then, after installing graphviz:
pprof.rb --pdf /tmp/[filename] > profile-[filename].pdf
...generates something like this: http://imgur.com/Tb2e3ZD,LZSxTh9#1
The profiling run confirms that some algorithm is non-linear in the size of the input in WikiCloth - in particular the WikiCloth WikiBuffer#add_char -> WikiCloth WikiBuffer#check_globals sequence gets called 10x for for 2x the input. Any idea what the issue might be?
Thanks.
Yea unfortunately WikiCloth has always been kinda slow. I will add this file to the list of documents I test because 16 seconds is just obscene, but I don't really know if the situation will improve much in the short term.
Hi nricciar, I found that the source of the slowness was some invalid HTML tags; when I removed them, the page time went down by a factor of 3x. So this corner case might actually be a bit easier to resolve than expected.
Parsing markup from http://en.wikipedia.org/wiki/List_of_female_tennis_players is much more longer (I didn't wait for finish).