commonmark-spec icon indicating copy to clipboard operation
commonmark-spec copied to clipboard

Character references in link definition labels

Open wooorm opened this issue 6 years ago • 5 comments

  • Character references are allowed everywhere, except in fenced code, indented code, or code spans
  • They represent their resolved character, not syntax

There’s even example 318 of having them in link definition destinations and link definition titles.

But, the following does not resolve into a link:

[©]: example.com

[©][]

I interpret the spec as saying that it should resolve, but then the dingus doesn’t. This may be a bug for the dingus implementation, rather than the spec.

wooorm avatar Oct 01 '19 18:10 wooorm

I agree that, to meet author expectation or intuition, character references of all kinds should be normalized in link labels (and elsewhere), especially since letter case is being ignored. Unfortunately, only a single implementation, Maruku, does it this way, although most CM-conformant parsers (and Pandoc) will happily convert any HTML entities to plain characters on output.

One label matches another just in case their normalized forms are equal. To normalize a label, strip off the opening and closing brackets, perform the Unicode case fold, strip leading and trailing whitespace and collapse consecutive internal whitespace to a single space.

Note that matching is performed on normalized strings, not parsed inline content. So the following does not match, even though the labels define equivalent inline content:

Example 541

[bar][foo\!]

[foo!]: /url

The rules for the link text are the same as with inline links.

An inline link […]
character references in the destination will be parsed into the corresponding Unicode code points, as usual.

character references are recognized in any context besides code spans or code blocks, including URLs, link titles, […]

link label […]
The contents of the first link label are parsed as inlines, which are used as the link’s text.

The link text may contain inline content: [Example 526]

Crissov avatar Oct 02 '19 07:10 Crissov

This might be related to #572.

mgeier avatar Oct 02 '19 08:10 mgeier

Btw, I think this should be true for character escapes too:

[©]: a.com
[\!]: b.com

Both should link: [©], [!]

Yields:

Both should link: [©], [!]

wooorm avatar Jul 04 '20 17:07 wooorm

@jgm Is this something you agree with? I can create a PR to clarify the docs

wooorm avatar Jul 04 '20 18:07 wooorm

I don't think this needs clarification of the docs so much as a bug report against CommonMark.js.

That said, every single Markdown implementation but one fails this test.

vassudanagunta avatar Jul 06 '20 20:07 vassudanagunta