comrak icon indicating copy to clipboard operation
comrak copied to clipboard

parsing of `**A*B*C*` doesn't match CM-dingus

Open mikeando opened this issue 2 years ago • 2 comments

comrak currently parses **A*B*C* as <p>**A<em>B</em>C*</p> the common-mark dingus gives the result <p>*<em>A<em>B</em>C</em></p>". (My implementation agrees with the dingus)

mikeando avatar May 21 '22 12:05 mikeando

Here is the same piece of code to see how github renders it.

*ABC

Looking in the preview and pulling up the developer tools I get:

<p dir="auto">**A<em>B</em>C*</p>

which shows GitHubs behaviour matches comrak.

So I'm notsure what we'd want to do here...

mikeando avatar May 21 '22 12:05 mikeando

After a careful reading of the spec I think that the dingus is correct.

  1. First we consider ** it can't end an emph, as there are no earlier ones.
  2. Next we consider the * in A*B. It is both left-and-right flanking and so can start or end emph. We look backwards from there and find the starting** - this can start emph, but we're not allowed to use it since the lengths of ** and * add to 3. - there are no earlier entries so we move on.
  3. Next we consider the* in B*C - again both left and right flanking. Searching backward we hit the* in A*B - the sum of the lengths is not 3 so we can use it. This means we now have **A<em>B</em>C*. We move on.
  4. FInally we reach the ending *. It is only right-flanking - so it can only end emph. Searching backward we find the initial **. The** can only start emph and the final *can only end emph, so the sum-to-3 issue does not occur - and they match, giving*<em>A<em>B</em>C</em>.

I guess we might end up with comrak/GFMs behaviour if we considered that final * to be able to both start and end emph - in which case the sum-to-3 rule would apply.

However, were that the case we should see the issue with the simpler **A* - which comrak, the dingus and my code all parse as *<em>A</em>.

mikeando avatar May 21 '22 13:05 mikeando

Indeed, you are quite right: it looks like the spec always had this implication, but there was never an example that spelled it out. cmark upstream used to do this wrong, and so cmark-gfm (and thus Comrak) followed suit. cmark upstream addressed this bug in https://github.com/commonmark/cmark/commit/dc9366c1a9be4f6c6711556dc175b2583152acd6, and so it would be a similarly simple fix in Comrak.

A fix will be forthcoming — thanks so much!

kivikakk avatar Jan 01 '23 10:01 kivikakk