commonmark-spec
commonmark-spec copied to clipboard
Consider preventing autolinks in links
Background
The CommonMark spec prevents “actual” links (those with a reference (e.g., [x]), or those with a resource (e.g., (x "y")), from occurring inside each other:
[a [b](c) d](e)
<p>[a <a href="c">b</a> d](e)</p>
The reason for that, is that the HTML spec does not allow it, and specifically the parsing algorithm of the HTML spec makes it impossible:
document.body.innerHTML = '<p><a href="e">a <a href="c">b</a> d</a></p>';
console.log(document.body.innerHTML)
Yields:
<p><a href="e">a </a><a href="c">b</a> d</p>
(the first link is closed when a new link is opened, its link end is later ignored).
CommonMark does this by, when a link is matched (the “deep” one is matched first), marking earlier link starts as “inactive”. It does not do that for links in images, or images in links, because it doesn‘t need to: images in links are fine, and links “in” images do not generate HTML (somewhat related: https://github.com/commonmark/commonmark-spec/issues/716)
Problem
However, there is another way to create links in links with markdown: autolinks (<https://example.com>) inside links:
[a <https://example.com> b](c)
Yields:
<p><a href="c">a <a href="https://example.com">https://example.com</a> b</a></p>
Solution
There are two solutions. Which to choose, depends on which of the two links is more likely the one an author intended.
Solution A: mark link starts as inactive
The code for this would be very similar to the one already used for “normal” links: when a valid autolink is parsed, mark earlier link starts as inactive. The output then would be:
<p>[a <a href="https://example.com">https://example.com</a> b](c)</p>
Solution B: outputting the source of an autolink when compiling, when in links
The code for this would be very similar to what is likely needed for images, when compiling.
When in an image alt, no tags are generated.
Something similar can be done when inside a link: output < (instead of <a href="https://example.com">), output the text like normal, and > (instead of </a>).
The output then would be:
<p><a href="c">a <https://example.com> b</a></p>
Consideration
The current state is clearly broken: it’s not something someone wants or is expecting: HTML doesn’t like it and only halve of an authors outer link is “clickable”.
With solution A, both URLs are displayed verbatim, the inner one “clickable”, but both usable. This is like how “normal” links work in CommonMark, so I think it is the best solution.
I think solution B is cleaner output, and chances are it’s more likely what an author intended, both URLs being the same, showing the URL that someone goes to:
[For more information, see <https://example.com/some-page>!](https://example.com/some-page).
Though, solution B gets confusing if the URLs are different, a reader sees one URL, but is taken to another:
[For more information, see <https://example.com/some-page>!](https://example.com/a-completely-different-page).
I think both are acceptable and an improvement, and I’d prefer solution A a little bit more than solution B.
Thanks for raising this. It would be good to get more feedback on the proposed solutions.