markdown-it-py icon indicating copy to clipboard operation
markdown-it-py copied to clipboard

Improving performance

Open hukkin opened this issue 3 years ago • 4 comments

Describe the problem/need and solution

Problem / Idea According to updated benchmark results (https://github.com/executablebooks/markdown-it-py/pull/196) mistletoe beat us. This is embarrassing. Haha, no, hats off to mistletoe authors!

Solution The profiler (https://github.com/executablebooks/markdown-it-py/pull/197) reveals that we spend a significant portion of execution time, around one third, converting string characters to ints here.

As we've researched before, this is a performance hack in upstream JavaScript, but for us it hurts performance in a major way. It also makes the code slightly less readable. We've already resorted to caching these int sequences (diverging from JS upstream), which is basically a performance hack on top of a failed performance hack, where as a result performance still suffers. I'd be interested to move to using the str type only.

The naive way to implement this will break basically all parser extensions I believe (mdit-py-plugins). What we could do is a deprecation period for srcCharCode, where:

  • the core library moves to using src for increased performance
  • accessing srcCharCode emits DeprecationWarnings
  • srcCharCode is generated lazily, only when accessed. This means that the performance loss only occurs when using deprecated extension, not when using core markdown-it

Benefit Increased performance.

Guide for implementation

No response

Tasks and updates

No response

hukkin avatar Feb 05 '22 09:02 hukkin

What do you think @chrisjsewell ? If you're worried about the breaking change, we could make the deprecation period very long or infinite.

hukkin avatar Feb 08 '22 11:02 hukkin

I think it sounds reasonable 👍

chrisjsewell avatar Feb 10 '22 12:02 chrisjsewell

Great, I can work on this.

Do you agree with this comment that we should just scrap https://github.com/executablebooks/markdown-it-py/pull/190 ? Or do you want an out-of-range safe wrapper for src, something like

def srcAt(self, idx: int) -> Optional[str]:
    try:
        return self._src[idx]
    except IndexError:
        return None

Personally I wouldn't do this for reasons explained in the comment I linked.

hukkin avatar Feb 10 '22 12:02 hukkin

Another way to increase performance would be to fill in whatever type annotations that are missing (enforced by the disallow_untyped_defs = true mypy setting) and publishing binary wheels built with mypyc.

What do you think @chrisjsewell ?

hukkin avatar May 08 '22 20:05 hukkin