rouge
rouge copied to clipboard
Speed improvement when formatting HTML
A slight performance improvement when formatting HTML. This will further reduce the number of new strings being created, and also prevents the regex being instantiated multiple times.
Briefly benchmarked this by adding 50,000 extra tests and the gain is marginal but there:
Original
Finished in 18.505466s, 57.2263 runs/s, 2827.1107 assertions/s.
1059 runs, 52317 assertions, 0 failures, 0 errors, 0 skips
This PR
Finished in 17.132288s, 61.8131 runs/s, 3053.7077 assertions/s.
1059 runs, 52317 assertions, 0 failures, 0 errors, 0 skips
I do however have a question about the safety of using gsub!
to replace the tokens in the existing string here. I am assuming that the strings would already have been duplicated from the original user input by this point (having gone through a lexer). I'd like to clarify this with someone more experienced :smile:
@robotmay I think gsub!
will be fine. We have one other use of it in Rouge. The biggest issue would be if the use of gsub!
caused the raw text output of Rouge to differ from the input. Do you have an example I can run through the debugger that hits this code to test it?
Also, I fixed CI. Can you please rebase and push to trigger a new run?
Uh oh. It does appear to break the output. All the special characters become encoded - <
becomes <
.
You can run bundle exec rackup
locally and visit localhost:9292/html
to see how your code change affects the visual output. Holler if I can help debug further. Thanks for digging in on performance.
@dblessing That's expected behaviour with the HTML formatter - so that you can put it into an HTML document safely.
Ohhhh wait it's double-encoding it somehow nevermind.