wikicloth icon indicating copy to clipboard operation
wikicloth copied to clipboard

Extreme slowness with a large page

Open brandonheller opened this issue 11 years ago • 4 comments

I'm using wikicloth as part of Gollum and a page is taking a really long time to generate, as in, 16+ seconds: https://github.com/gollum/gollum/issues/680

The short version of that bug report is that there appears to be some exponential algorithm used, such that page generate times are fine (~2 sec) for half the page, but then increase sharply to the full page, where they are 16+ seconds.

I have wikicloth installed locally from a gem, but I don't know enough ruby to make a simple file to use wikicloth to convert a file, or to profile it, which would be the logical next step to fix this.

Do you have any suggestions? The wikicloth output looks great, but it's too slow to be usable. Thanks!

brandonheller avatar Apr 02 '13 21:04 brandonheller

With the help of a Gollum developer I was able to profile this: https://github.com/gollum/gollum/issues/680

It should be pretty easy to replicate. Copy text from http://www.openflow.org/wk/index.php?title=OpenFlow_Tutorial&action=edit into a file.

Run this script:

# gem install perftools.rb wikicloth
require 'perftools'
require 'wikicloth'
ARGV.each do|input|
  puts "File: #{input}"
  PerfTools::CpuProfiler.start("/tmp/profile-#{input}") do
    WikiCloth::Parser.new({ :data => File.read(input) }).to_html
  end
end

Run the profile script with the file as input, then, after installing graphviz:

pprof.rb --pdf /tmp/[filename] > profile-[filename].pdf

...generates something like this: http://imgur.com/Tb2e3ZD,LZSxTh9#1

The profiling run confirms that some algorithm is non-linear in the size of the input in WikiCloth - in particular the WikiCloth WikiBuffer#add_char -> WikiCloth WikiBuffer#check_globals sequence gets called 10x for for 2x the input. Any idea what the issue might be?

Thanks.

brandonheller avatar Apr 03 '13 01:04 brandonheller

Yea unfortunately WikiCloth has always been kinda slow. I will add this file to the list of documents I test because 16 seconds is just obscene, but I don't really know if the situation will improve much in the short term.

nricciar avatar Apr 08 '13 19:04 nricciar

Hi nricciar, I found that the source of the slowness was some invalid HTML tags; when I removed them, the page time went down by a factor of 3x. So this corner case might actually be a bit easier to resolve than expected.

brandonheller avatar Apr 08 '13 20:04 brandonheller

Parsing markup from http://en.wikipedia.org/wiki/List_of_female_tennis_players is much more longer (I didn't wait for finish).

djstrong avatar Mar 16 '15 16:03 djstrong