rouge icon indicating copy to clipboard operation
rouge copied to clipboard

Speed improvement when formatting HTML

Open robotmay opened this issue 4 years ago • 4 comments

A slight performance improvement when formatting HTML. This will further reduce the number of new strings being created, and also prevents the regex being instantiated multiple times.

Briefly benchmarked this by adding 50,000 extra tests and the gain is marginal but there:

Original

Finished in 18.505466s, 57.2263 runs/s, 2827.1107 assertions/s.
1059 runs, 52317 assertions, 0 failures, 0 errors, 0 skips

This PR

Finished in 17.132288s, 61.8131 runs/s, 3053.7077 assertions/s.
1059 runs, 52317 assertions, 0 failures, 0 errors, 0 skips

I do however have a question about the safety of using gsub! to replace the tokens in the existing string here. I am assuming that the strings would already have been duplicated from the original user input by this point (having gone through a lexer). I'd like to clarify this with someone more experienced :smile:

robotmay avatar Jan 28 '21 16:01 robotmay

@robotmay I think gsub! will be fine. We have one other use of it in Rouge. The biggest issue would be if the use of gsub! caused the raw text output of Rouge to differ from the input. Do you have an example I can run through the debugger that hits this code to test it?

Also, I fixed CI. Can you please rebase and push to trigger a new run?

dblessing avatar Feb 12 '21 20:02 dblessing

Uh oh. It does appear to break the output. All the special characters become encoded - < becomes &lt;.

You can run bundle exec rackup locally and visit localhost:9292/html to see how your code change affects the visual output. Holler if I can help debug further. Thanks for digging in on performance.

Screen Shot 2021-02-12 at 2 50 33 PM

dblessing avatar Feb 12 '21 20:02 dblessing

@dblessing That's expected behaviour with the HTML formatter - so that you can put it into an HTML document safely.

jneen avatar Feb 14 '21 05:02 jneen

Ohhhh wait it's double-encoding it somehow nevermind.

jneen avatar Feb 14 '21 05:02 jneen