commonmark-spec
commonmark-spec copied to clipboard
Non-spaced tokens with emphasis seems not parsed as expected
We have introduced CommonMark to our web service and found that some emphasises do not work as expected if they are in non-spaced tokens and links. Non-spaced tokens are important for agglutinative languages like Japanese, so I think it is a spec bug.
Example
Expect:
foo**bar**baz
foo**[bar](#)**baz
to be:
foo<strong>bar</strong>baz
foo<strong><a href="#">bar</a></strong>baz
but got:
foo<strong>bar</strong>baz # OK
foo**<a href="#">bar</a>**baz # NG: ** are not parsed as emphasis
Environment
- CommonMarker v0.16.8
- commonmark.js v0.28.1
- e.g.
echo 'foo**[bar](#)**baz' | commonmark -- /dev/stdin
- e.g.
cc @iology, we've had a discussion which briefly touched on some potential issues with spaces in Chinese texts when using the CommonMark emphasis, so incase you had any thoughts here.
Interestingly, most of markdown parsers that are listed in babelmark2 can parse it as expected.
I'll try to make a patch to fix it.
I'll try to make a patch to fix it.
Well, it could make sense that the left-flanking and right-flanking runs do not take punctuation into an account if that character was consumed in a link or another syntax Markdown construct. But it can be relatively e difficult to implement.
Interestingly, most of markdown parsers that are listed in babelmark2 can parse it as expected.
But it seems, most of them are not that clever, but so simplistic: Punctuation obviously has no impact on left-flanking and right-flanking runs for (most of) them. See foo**+**baz.
One possibility would be to say that a delim run that is immediately to the left of an open parenthesis, bracket, or brace is automatically left flanking, and a delim run that is immediately to the right of a close parenthesis, bracket, or brace is automatically right flanking.
Any thoughts about this proposal?
Any thoughts about this proposal?
The proposal would "fix" foo**[bar](#)**baz and "break" foo[**bar**](#)baz instead. So the overall score would be the same, at the cost of yet another rule to implement.
Martin Mitáš [email protected] writes:
Any thoughts about this proposal?
The proposal would "fix"
foo**[bar](#)**bazand "break"foo[**bar**](#)bazinstead. So the overall score would be the same, at the cost of yet another rule to implement.
Why would it break the other case? The proposal says that ** before
[ counts as left-flanking, and ** after ] counts as
right-flanking. It says nothing about ** after [ or ** before
].
Oops. You are right. Still I am not sure about it.
Although it is artificial example, consider trying to make xxx bold in xxx(. Note that escaping can help you here only if ( does not form e.g. start of a link.
Also the proposal does not help in situations where the punctuation character itself does not imply opening/closing per se, yet within context of markdown syntax parsing it does:
**∑**x
or:
foo****baz
Also, when I return to the original report, I might want as well to make bold the text span before and/or after the link:
**foo**[bar](#)baz
foo[bar](#)**baz**
**foo**[bar](#)**baz**
I don't think my proposal creates any problems for
**foo**[bar](#)baz
or these others. With current spec rules, the first ** in this example is left-flanking but not right-flanking, and the second ** is right-flanking but not left-flanking. With the change I suggested, the second ** would be BOTH right-flanking and left-flanking. So you'd still get boldface here.
The example
**∑**x
is not handled by my proposal, but I'm less worried about this kind of case. The example
foo****baz
is also not handled. This, too, seems like something that will come up much more rarely than a boldface link. (Why would you boldface an image anyway?) So I'm less concerned about it.
Of course, we could handle all these cases if we wanted to, by looking at more context than just the immediately preceding and following characters, but that makes parsing more complicated and maybe less efficient.