commonmarker icon indicating copy to clipboard operation
commonmarker copied to clipboard

Slow performance on smaller documents

Open GUI opened this issue 1 year ago • 7 comments

I noticed that Commonmarker (v1.0.4) was performing worse than Kramdown for my own documents, which I wasn't expecting based on the benchmarks. I was able to replicate your benchmark results on my own computer (where commonmarker appears faster than Kramdown when using the default 11MB bencmark markdown file), but where I found a discrepancy was if I started benchmarking smaller documents, where then Commonmarker appears to be one of the slower options.

This slowdown for small documents is not the case for the v0.23.10 release of Commonmarker, so I think it's somehow tied to the switch to comrak. So I'm not sure if this is an issue with comrak or in the Ruby bindings, but I thought I'd start here since I stumbled into this when comparing Ruby libraries. I'm wondering if there's perhaps some overhead in initializing things or calling comrak that the large benchmark file maybe masks (since you'd be doing fewer iterations of calling Commanmarker repeatedly with a big document where more of the overhead is in the underlying Markdown parsing)? But I'm happy to shift this conversation over to comrak if you believe this is an issue on their end.

Here's what I'm seeing on my M1 MacBook Pro (observed both in Linux Docker images and directly on the Mac, but all of these tests would be using aarch64 binaries if that makes a difference):

v1.0.4 release with default benchinput.md input (11MB file)

input size = 11064832 bytes

ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin22]
Warming up --------------------------------------
           redcarpet     2.000 i/100ms
commonmarker with to_html
                         1.000 i/100ms
            kramdown     1.000 i/100ms
Calculating -------------------------------------
           redcarpet     23.045 (± 4.3%) i/s -    116.000 in   5.051665s
commonmarker with to_html
                          5.682 (± 0.0%) i/s -     29.000 in   5.119065s
            kramdown      0.401 (± 0.0%) i/s -      3.000 in   7.493124s

Comparison:
           redcarpet:       23.0 i/s
commonmarker with to_html:        5.7 i/s - 4.06x  slower
            kramdown:        0.4 i/s - 57.44x  slower

Roughly in line with the results from the readme where commonmarker is ~4x slower than redcarpet and Kramdown is ~60x slower than redcarpet.

v1.0.4 release with lorem1.md input (3.7KB file)

I used this lorem1.md sample file for a smaller benchmark file that is more representative of the files I'm passing through the library.

input size = 3789 bytes

ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin22]
Warming up --------------------------------------
           redcarpet     8.262k i/100ms
commonmarker with to_html
                        65.000 i/100ms
            kramdown   198.000 i/100ms
Calculating -------------------------------------
           redcarpet     85.995k (± 1.9%) i/s -    437.886k in   5.093855s
commonmarker with to_html
                        653.203 (± 1.4%) i/s -      3.315k in   5.075908s
            kramdown      1.946k (± 3.7%) i/s -      9.900k in   5.093986s

Comparison:
           redcarpet:    85995.3 i/s
            kramdown:     1946.4 i/s - 44.18x  slower
commonmarker with to_html:      653.2 i/s - 131.65x  slower

As you can see, with this smaller file Kramdown is now 44x slower than Redcarpet but Commonmarker is 131x slower than Redcarpet (and about 3x slower than Kramdown).

v0.23.10 release with default benchinput.md input (11MB file)

I then switched over to the libcmark-gfm based release of this gem to see how these same test files performed on the benchmark suite in that branch:

input size = 11064832 bytes

ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin22]
Warming up --------------------------------------
           redcarpet     2.000 i/100ms
commonmarker with to_html
                         1.000 i/100ms
commonmarker with to_xml
                         1.000 i/100ms
commonmarker with ruby HtmlRenderer
                         1.000 i/100ms
commonmarker with render_doc.to_html
                         1.000 i/100ms
            kramdown     1.000 i/100ms
Calculating -------------------------------------
           redcarpet     26.032 (± 3.8%) i/s -    130.000 in   5.015141s
commonmarker with to_html
                         17.601 (± 5.7%) i/s -     88.000 in   5.027352s
commonmarker with to_xml
                         17.368 (± 5.8%) i/s -     87.000 in   5.022493s
commonmarker with ruby HtmlRenderer
                          3.156 (± 0.0%) i/s -     16.000 in   5.142554s
commonmarker with render_doc.to_html
                         14.082 (±21.3%) i/s -     67.000 in   5.045525s
            kramdown      0.446 (± 0.0%) i/s -      3.000 in   6.727232s

Comparison:
           redcarpet:       26.0 i/s
commonmarker with to_html:       17.6 i/s - 1.48x  slower
commonmarker with to_xml:       17.4 i/s - 1.50x  slower
commonmarker with render_doc.to_html:       14.1 i/s - 1.85x  slower
commonmarker with ruby HtmlRenderer:        3.2 i/s - 8.25x  slower
            kramdown:        0.4 i/s - 58.37x  slower

With the large file Commonmarker is around 1.5x slower than Redcarpet. This also shows that Commonmarker v1.0.4 is about 2-3x slower than Commonmarker v0.23.10 for this type of large file, for whatever that's worth, but I realize you may have expected some performance differences with the switch in libraries.

v0.23.10 release with lorem1.md input (3.7KB file)

input size = 3789 bytes

ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin22]
Warming up --------------------------------------
           redcarpet    10.071k i/100ms
commonmarker with to_html
                         8.733k i/100ms
commonmarker with to_xml
                         9.069k i/100ms
commonmarker with ruby HtmlRenderer
                         2.574k i/100ms
commonmarker with render_doc.to_html
                         7.132k i/100ms
            kramdown   132.000 i/100ms
Calculating -------------------------------------
           redcarpet     94.489k (± 3.8%) i/s -    473.337k in   5.016736s
commonmarker with to_html
                         90.672k (± 1.6%) i/s -    454.116k in   5.009576s
commonmarker with to_xml
                         90.453k (± 1.8%) i/s -    453.450k in   5.014733s
commonmarker with ruby HtmlRenderer
                         25.674k (± 3.7%) i/s -    128.700k in   5.020167s
commonmarker with render_doc.to_html
                         72.578k (± 4.2%) i/s -    363.732k in   5.020900s
            kramdown      1.322k (± 3.8%) i/s -      6.732k in   5.098611s

Comparison:
           redcarpet:    94488.7 i/s
commonmarker with to_html:    90671.7 i/s - same-ish: difference falls within error
commonmarker with to_xml:    90453.4 i/s - same-ish: difference falls within error
commonmarker with render_doc.to_html:    72577.6 i/s - 1.30x  slower
commonmarker with ruby HtmlRenderer:    25673.7 i/s - 3.68x  slower
            kramdown:     1322.3 i/s - 71.46x  slower

Here you can see the bigger difference between the old and new releases of Commonmarker, since Commonmarker is nearly as fast as Redcarpet with this small file on under v0.23.10, but significantly slower in v1.0.4.

Let me know if you have any questions, but thanks for all your work on this gem!

GUI avatar Mar 12 '24 04:03 GUI

Thank you for these numbers and the test file. I had a similar experience while working on updating the benchmarks in https://github.com/gjtorikian/commonmarker/pull/276/commits/0224ec83a46419c63d9e18ccf6004f5ca260cad9.

To be honest, I have not really looked into the performance of this gem, but I am starting to come around to it being the next thing to look at. I don't know when I'll have the time to dedicate to measuring the performance. I can say that it's pretty likely that any potential optimizations can be made in comrak; Commonmarker is just a really dumb wrapper around that lib.

gjtorikian avatar Mar 12 '24 13:03 gjtorikian

Just stopping by this issue and thought I'd re-run the lorem1.md comparison today (2.4.1):

input size = 3789 bytes

ruby 3.4.4 (2025-05-14 revision a38531fd3f) +PRISM [arm64-darwin24]
Warming up --------------------------------------
  Markly.render_html     6.572k i/100ms
Markly::Node#to_html     6.634k i/100ms
Commonmarker.to_html    83.000 i/100ms
Commonmarker::Node.to_html
                        80.000 i/100ms
Kramdown::Document#to_html
                       167.000 i/100ms
Calculating -------------------------------------
  Markly.render_html     66.468k (± 2.3%) i/s -    335.172k in   5.045286s
Markly::Node#to_html     66.903k (± 2.2%) i/s -    338.334k in   5.059475s
Commonmarker.to_html    833.183 (± 0.4%) i/s -      4.233k in   5.080602s
Commonmarker::Node.to_html
                        804.897 (± 0.5%) i/s -      4.080k in   5.069132s
Kramdown::Document#to_html
                          1.683k (± 0.7%) i/s -      8.517k in   5.061121s

Comparison:
Markly::Node#to_html:    66903.4 i/s
  Markly.render_html:    66468.1 i/s - same-ish: difference falls within error
Kramdown::Document#to_html:     1682.9 i/s - 39.75x  slower
Commonmarker.to_html:      833.2 i/s - 80.30x  slower
Commonmarker::Node.to_html:      804.9 i/s - 83.12x  slower

Hard to do an exact apples-to-apples comparison with what we saw before, but for a bit of a sign of progress, Commonmarker.to_html is now 2.02x slower compared to Kramdown::Document#to_html, whereas 1.0.4 was 2.98x.

kivikakk avatar Sep 29 '25 10:09 kivikakk

I ended up making some tweaks locally as to how options are parsed and was able to speed this up quite a bit:

input size = 3789 bytes

ruby 3.4.0dev (2024-12-25 master f450108330) +PRISM [arm64-darwin24]
Warming up --------------------------------------
  Markly.render_html    10.219k i/100ms
Markly::Node#to_html    10.264k i/100ms
Commonmarker.to_html     8.172k i/100ms
Commonmarker::Node.to_html
                         5.071k i/100ms
Kramdown::Document#to_html
                       315.000 i/100ms
Calculating -------------------------------------
  Markly.render_html     99.062k (± 2.2%) i/s   (10.09 μs/i) -    500.731k in   5.057165s
Markly::Node#to_html    100.237k (± 2.3%) i/s    (9.98 μs/i) -    502.936k in   5.020076s
Commonmarker.to_html     82.305k (± 1.8%) i/s   (12.15 μs/i) -    416.772k in   5.065346s
Commonmarker::Node.to_html
                         51.671k (± 2.2%) i/s   (19.35 μs/i) -    258.621k in   5.007502s
Kramdown::Document#to_html
                          3.301k (± 1.9%) i/s  (302.93 μs/i) -     16.695k in   5.059258s

Comparison:
Markly::Node#to_html:   100236.9 i/s
  Markly.render_html:    99062.3 i/s - same-ish: difference falls within error
Commonmarker.to_html:    82304.8 i/s - 1.22x  slower
Commonmarker::Node.to_html:    51670.9 i/s - 1.94x  slower
Kramdown::Document#to_html:     3301.1 i/s - 30.36x  slower

The option parsing on the Rust side of things has long been an annoyance for me, because I just slapped some code together when first reelasing the gem and always promised to "make it better." I'll spend a little more time cleaning this up and cut a release with the perf improvements.

gjtorikian avatar Sep 29 '25 15:09 gjtorikian

Good lord! That's nice :D

kivikakk avatar Sep 30 '25 00:09 kivikakk

Yeah…but I think either the benchmark is bunk OR a lot of improvements can come if this gem let you instantiate a class.

Basically the options parsing is taking up a lot of time. If the options were parsed once (Commonmarker.new(options: opts)) outside of the benchmark, then we’d see a lot more improvement in the timing of the actual commonmarker -> html process. I’m not sure I’m ready to cut a new major version bump just for that change though…

gjtorikian avatar Sep 30 '25 15:09 gjtorikian

I wonder if we couldn't have both? Commonmarker.to_html and Commonmarker.parse continue to work as-is, but we add Commonmarker#to_html and Commonmarker#parse. (And perhaps the former two can be changed to just call new(options:).to_html etc.) Then it probably doesn't need a major bump.

If we imagine a typical long-running server use-case where Commonmarker.to_html is called at the bottom of some pipeline, the options are probably never changing (per call-site, anyway), and a lot of the inputs are probably small. In which case, might as well save the effort? I imagine the other major use case is probably being run in a script, in which case process creation dominates and it's irrelevant.

The nice thing about amortising the option parsing this way is that it doesn't really matter how much time it takes to parse!

Happy to spike this out if you don't have the time, btw :heart:

kivikakk avatar Oct 01 '25 02:10 kivikakk

Oh, I totally did that, something like:

def initialize(options)
  @options = options
end

def parse
  commonmark_parse(text, options: @options)
end

class << self
   def parse(text, options)
      Commonmarker.new(options).parse(text)
   end
end

The problem I ran into was the Commonmarker is a module right now, so I'd have to switch it to a class (breaking the API, technically), or create some other instantiable class (Commonmarker::Render?), and I didn't like either of those options. 😓

If we imagine a typical long-running server use-case where Commonmarker.to_html is called at the bottom of some pipeline, the options are probably never changing (per call-site, anyway), and a lot of the inputs are probably small. In which case, might as well save the effort?

Yep, 💯 . Maybe the version semver leap is enough to justify this.

gjtorikian avatar Oct 01 '25 14:10 gjtorikian