commonmark-spec icon indicating copy to clipboard operation
commonmark-spec copied to clipboard

Non-spaced tokens with emphasis seems not parsed as expected

Open gfx opened this issue 8 years ago • 9 comments

We have introduced CommonMark to our web service and found that some emphasises do not work as expected if they are in non-spaced tokens and links. Non-spaced tokens are important for agglutinative languages like Japanese, so I think it is a spec bug.

Example

Expect:

foo**bar**baz

foo**[bar](#)**baz

to be:

foo<strong>bar</strong>baz

foo<strong><a href="#">bar</a></strong>baz

but got:

foo<strong>bar</strong>baz # OK

foo**<a href="#">bar</a>**baz # NG: ** are not parsed as emphasis

Environment

  • CommonMarker v0.16.8
  • commonmark.js v0.28.1
    • e.g. echo 'foo**[bar](#)**baz' | commonmark -- /dev/stdin

gfx avatar Aug 10 '17 07:08 gfx

cc @iology, we've had a discussion which briefly touched on some potential issues with spaces in Chinese texts when using the CommonMark emphasis, so incase you had any thoughts here.

aidantwoods avatar Aug 10 '17 07:08 aidantwoods

Interestingly, most of markdown parsers that are listed in babelmark2 can parse it as expected.

babelmark2 foo**[bar](#)**baz

I'll try to make a patch to fix it.

gfx avatar Aug 17 '17 01:08 gfx

I'll try to make a patch to fix it.

Well, it could make sense that the left-flanking and right-flanking runs do not take punctuation into an account if that character was consumed in a link or another syntax Markdown construct. But it can be relatively e difficult to implement.

Interestingly, most of markdown parsers that are listed in babelmark2 can parse it as expected.

But it seems, most of them are not that clever, but so simplistic: Punctuation obviously has no impact on left-flanking and right-flanking runs for (most of) them. See foo**+**baz.

mity avatar Aug 17 '17 04:08 mity

One possibility would be to say that a delim run that is immediately to the left of an open parenthesis, bracket, or brace is automatically left flanking, and a delim run that is immediately to the right of a close parenthesis, bracket, or brace is automatically right flanking.

Any thoughts about this proposal?

jgm avatar Mar 25 '18 22:03 jgm

Any thoughts about this proposal?

The proposal would "fix" foo**[bar](#)**baz and "break" foo[**bar**](#)baz instead. So the overall score would be the same, at the cost of yet another rule to implement.

mity avatar Mar 25 '18 23:03 mity

Martin Mitáš [email protected] writes:

Any thoughts about this proposal?

The proposal would "fix" foo**[bar](#)**baz and "break" foo[**bar**](#)baz instead. So the overall score would be the same, at the cost of yet another rule to implement.

Why would it break the other case? The proposal says that ** before [ counts as left-flanking, and ** after ] counts as right-flanking. It says nothing about ** after [ or ** before ].

jgm avatar Mar 26 '18 00:03 jgm

Oops. You are right. Still I am not sure about it.

Although it is artificial example, consider trying to make xxx bold in xxx(. Note that escaping can help you here only if ( does not form e.g. start of a link.

Also the proposal does not help in situations where the punctuation character itself does not imply opening/closing per se, yet within context of markdown syntax parsing it does:

**&sum;**x

or:

foo**![bar](#)**baz

mity avatar Mar 26 '18 06:03 mity

Also, when I return to the original report, I might want as well to make bold the text span before and/or after the link:

**foo**[bar](#)baz
foo[bar](#)**baz**
**foo**[bar](#)**baz**

mity avatar Mar 26 '18 06:03 mity

I don't think my proposal creates any problems for

**foo**[bar](#)baz

or these others. With current spec rules, the first ** in this example is left-flanking but not right-flanking, and the second ** is right-flanking but not left-flanking. With the change I suggested, the second ** would be BOTH right-flanking and left-flanking. So you'd still get boldface here.

The example

**&sum;**x

is not handled by my proposal, but I'm less worried about this kind of case. The example

foo**![bar](#)**baz

is also not handled. This, too, seems like something that will come up much more rarely than a boldface link. (Why would you boldface an image anyway?) So I'm less concerned about it.

Of course, we could handle all these cases if we wanted to, by looking at more context than just the immediately preceding and following characters, but that makes parsing more complicated and maybe less efficient.

jgm avatar Mar 26 '18 21:03 jgm