commonmark-hs icon indicating copy to clipboard operation
commonmark-hs copied to clipboard

Autolinks extension should ignore URIs inside link descriptions

Open kukimik opened this issue 3 years ago • 1 comments

Calling:

commonmark-cli -x autolinks <<EOF
[https://www.website.com](https://www.website.com#something)

[[email protected]](mailto:[email protected]?subject=Some%20subject)

[A website similar to https://www.foo.com and https://www.bar.com](https://www.baz.com)
EOF

results in (note the nested <a> tags):

<p><a href="https://www.website.com#something"><a href="https://www.website.com">https://www.website.com</a></a></p>
<p><a href="mailto:[email protected]?subject=Some%20subject"><a href="mailto:[email protected]">[email protected]</a></a></p>
<p><a href="https://www.baz.com">A website similar to <a href="https://www.foo.com">https://www.foo.com</a> and <a href="https://www.bar.com">https://www.bar.com</a></a></p>

while I would expect

<p><a href="https://www.website.com#something">https://www.website.com</a></p>
<p><a href="mailto:[email protected]?subject=Some%20subject">[email protected]</a></p>
<p><a href="https://www.baz.com">A website similar to https://www.foo.com and https://www.bar.com</a></p>

One reason is that nested links are illegal in HTML5 and HTML4.

This bite me in https://github.com/EmaApps/emanote/issues/349.

kukimik avatar Sep 14 '22 19:09 kukimik

Related issue about explicit autolinks: https://github.com/commonmark/commonmark-spec/issues/719

Actually this may be a bit hard to achieve, given the architecture used in this library. If we were parsing to an AST, we could simply substitute any links in the link description for their associated link text. But this library allows you to parse directly to an output format, so this isn't possible in general. Moreover, we don't know whether a bit of text is part of a link description until AFTER we've parsed it as an autolink (since the matching of brackets takes place at a later stage).

If you parse to an AST (which is possible, just not required, with this library), then you can always walk the document after parsing and remove links inside links.

jgm avatar Sep 19 '22 01:09 jgm