commonmark icon indicating copy to clipboard operation
commonmark copied to clipboard

Optimize Cursor implementation

Open colinodell opened this issue 3 years ago • 2 comments

It may be possible to optimize the Cursor implementation by relying more heavily on byte positions internally than character positions. It's possible that character positions could be eliminated completely if they aren't entirely needed by external code, or perhaps we could track both for convenience but only rely on byte positions internally.

colinodell avatar Jun 19 '21 14:06 colinodell

Seems to be why my page takes a little longer to load with multi-byte characters. Is there any specific reason for mbstring to process markdown? IIRC the syntax is all ASCII...

live627 avatar Mar 22 '23 13:03 live627

Although you're correct that the syntax is ASCII, how that syntax is interpreted depends on the context where it is used. The CommonMark specification says that Unicode whitespace and punctuation characters are significant when determining that context. For example:

A single _ character can close emphasis iff it is part of a right-flanking delimiter run and either (a) not part of a left-flanking delimiter run or (b) part of a left-flanking delimiter run followed by a Unicode punctuation character.

(emphasis added)

So we do need the ability to parse individual Unicode codepoints to properly handle the syntax - I guess the question is "how do we best do that?"

colinodell avatar Mar 22 '23 13:03 colinodell