markdown-it-py
markdown-it-py copied to clipboard
Improving performance
Describe the problem/need and solution
Problem / Idea According to updated benchmark results (https://github.com/executablebooks/markdown-it-py/pull/196) mistletoe beat us. This is embarrassing. Haha, no, hats off to mistletoe authors!
Solution The profiler (https://github.com/executablebooks/markdown-it-py/pull/197) reveals that we spend a significant portion of execution time, around one third, converting string characters to ints here.
As we've researched before, this is a performance hack in upstream JavaScript, but for us it hurts performance in a major way. It also makes the code slightly less readable. We've already resorted to caching these int sequences (diverging from JS upstream), which is basically a performance hack on top of a failed performance hack, where as a result performance still suffers. I'd be interested to move to using the str type only.
The naive way to implement this will break basically all parser extensions I believe (mdit-py-plugins). What we could do is a deprecation period for srcCharCode, where:
- the core library moves to using
srcfor increased performance - accessing
srcCharCodeemitsDeprecationWarnings srcCharCodeis generated lazily, only when accessed. This means that the performance loss only occurs when using deprecated extension, not when using core markdown-it
Benefit Increased performance.
Guide for implementation
No response
Tasks and updates
No response
What do you think @chrisjsewell ? If you're worried about the breaking change, we could make the deprecation period very long or infinite.
I think it sounds reasonable 👍
Great, I can work on this.
Do you agree with this comment that we should just scrap https://github.com/executablebooks/markdown-it-py/pull/190 ? Or do you want an out-of-range safe wrapper for src, something like
def srcAt(self, idx: int) -> Optional[str]:
try:
return self._src[idx]
except IndexError:
return None
Personally I wouldn't do this for reasons explained in the comment I linked.
Another way to increase performance would be to fill in whatever type annotations that are missing (enforced by the disallow_untyped_defs = true mypy setting) and publishing binary wheels built with mypyc.
What do you think @chrisjsewell ?