commonmark-spec icon indicating copy to clipboard operation
commonmark-spec copied to clipboard

Emphasis intersection bug?

Open aidantwoods opened this issue 8 years ago • 9 comments

Given the following markdown:

**strong* still strong**

The online reference parser gives this output:

<p><em><em>strong</em> still strong</em>*</p>

However, using rule 15 (just below http://spec.commonmark.org/0.27/#can-open-emphasis)

  1. When two potential emphasis or strong emphasis spans overlap, so that the second begins before the first ends and ends after the first ends, the first takes precedence. Thus, for example, *foo _bar* baz_ is parsed as <em>foo _bar</em> baz_ rather than *foo <em>bar* baz</em>.

I think the output should be this

<p><strong>strong* still strong</strong></p>

Instead, the parser is behaving more like it would when faced with rule 16

  1. When there are two potential emphasis or strong emphasis spans with the same closing delimiter, the shorter one (the one that opens later) takes precedence. Thus, for example, **foo **bar baz** is parsed as **foo <strong>bar baz</strong> rather than <strong>foo **bar baz</strong>.

Even though these emphasis and strong emphasis spans do not have the same closing delimiter (so this rule should not apply).


Note that I am assuming that the phrase same closing delimiter (which is not formally defined) is referring to a delimiter run as categorised by its starting position in the string (this holds for the example given).

aidantwoods avatar May 13 '17 13:05 aidantwoods

Apologies, correction: it appears that rule 15 does not apply because the given example does not overlap in the way specified.

In which case rule 13 should be applied as far as I can tell:

  1. The number of nestings should be minimized. Thus, for example, an interpretation <strong>...</strong> is always preferred to <em><em>...</em></em>.

In any case,

<p><strong>strong* still strong</strong></p>

is prefered by this rule too.

aidantwoods avatar May 13 '17 14:05 aidantwoods

Most implementations seems to agree with this http://johnmacfarlane.net/babelmark2/?text=**strong*+still+strong**

(though they don't appear on the page for me, can get some results by going through the network responses): screen shot 2017-06-02 at 19 12 33 screen shot 2017-06-02 at 19 12 43 screen shot 2017-06-02 at 19 12 49 screen shot 2017-06-02 at 19 12 52 screen shot 2017-06-02 at 19 14 26 screen shot 2017-06-02 at 19 14 30 screen shot 2017-06-02 at 19 14 35 etc...

aidantwoods avatar Jun 02 '17 18:06 aidantwoods

Although I agree that the interpretation of this case is unexpected, I've come to appreciate that it's not possible to give a spec for emphasis that gives "intuitive" results in every case. The best we can do is to minimize the unintuitive cases, and the principles in the spec (now fairly complex) have been motivated by consideration of a large number of cases.

Maybe there's a way to modify the principles to get the "intuitive" result in your case without messing delivering other unintuitive results elsewhere, and without making it impossible to parse emphasis efficiently. We're very open to suggestions there. But otherwise we may have to accept that this is a case where you need to backslash-escape the asterisk. Not a big deal.

jgm avatar Mar 25 '18 23:03 jgm

So I'll close this, but feel free to re-open if you have a specific proposal.

jgm avatar Mar 25 '18 23:03 jgm

My concern, I suppose, isn't that it's an unintuitive result, rather that the result given by the reference parser contradicts the spec as far as I can tell? I wonder if perhaps we could extract the decision made by the reference parser so that the behaviour here could be formalised.

I don't much mind which result we pick (I would perhaps lean toward the one that I say is "expected" but it doesn't matter so much), rather I think it is important that the result is defined :)

(I'm painfully aware of how complex emphasis parsing is already)

aidantwoods avatar Mar 25 '18 23:03 aidantwoods

(Btw apparently I can't re-open the issue if you closed it o.0 – behaviour is news to me)

aidantwoods avatar Mar 25 '18 23:03 aidantwoods

Sorry, in skimming this I saw the point about rule 15 not applying, but missed the point about rule 13 applying.... I'll re-open.

jgm avatar Mar 26 '18 00:03 jgm

Rule 13 is a bit vague; perhaps if we try to be more precise about what is meant by "minimize nesting", we can bring the spec in conformity with the way the reference parsers treat this case. E.g. maybe we could just say:

  1. In cases of ambiguity, an interpretation <strong>...</strong> is always preferred to <em><em>...</em></em>.

jgm avatar Mar 26 '18 00:03 jgm

I think if you remove the phrase "minimise nesting" then this case becomes undefined. Since the difference between what I've called the expected result and the reference parser's isn't picking <strong>...</strong> over <em><em>...</em></em>, but rather it is picking <strong>...*...</strong> over <em><em>...</em>...</em>*. Without this rule I'm not sure how to choose between these results? i.e. the difference between these results does minimise nesting, but it's a little more than just picking <strong>...</strong> over <em><em>...</em></em> because there are positional changes in the structure that come with that choice (i.e. different *s are used).

Just to be clear here, which result are we aiming for? :)

If we're aiming for the result that the reference parser currently gives (<em>s), then I think then rule 13 needs to change to what you've said to allow for this result to exist – but I think there should be an additional rule that makes clear why this result occurs (i.e. need to make it a possible result, and then specify why to pick it over the other one).

If on the other hand we are aiming for the "expected result" (<strong>s), then I think this is only a parser bug.

aidantwoods avatar Mar 26 '18 15:03 aidantwoods