comrak icon indicating copy to clipboard operation
comrak copied to clipboard

Autolink edge cases

Open digitalmoksha opened this issue 10 months ago • 6 comments

Found a couple autolink edge cases:

  • See <<<http://example.com/>>>

    comrak: <p>See &lt;&lt;&lt;<a href="http://example.com/%3E%3E%3E">http://example.com/&gt;&gt;&gt;</a></p>
    cmark-gfm: <p>See &lt;&lt;&lt;<a href="http://example.com/">http://example.com/</a>&gt;&gt;&gt;</p>

  • http://example.com/[abc]

    comrak: <p><a href="http://example.com/%5Babc">http://example.com/[abc</a>]</p> cmark-gfm: <p><a href="http://example.com/%5Babc%5D">http://example.com/[abc]</a></p>

digitalmoksha avatar Apr 26 '24 18:04 digitalmoksha

Re: the second item

Rinku actually does balancing, like both cmark and comrak do for parentheses.

Looking at the cmark code, they don't consider a bracket as an ending delimiter - comrak does.

And it looks like I probably broke this when I added the relaxed-autolinks option - I added [ and ] to LINK_END_ASSORTMENT. https://github.com/kivikakk/comrak/pull/325/files

I can either

  • change the code to make it behave as cmark does, and only if with the relaxed-autolinks option use the current behavior (or Rinku style)
  • leave as is
  • make it Rinku style (supporting balanced brackets). So http://example.com[abc]] would give <a href=\"http://example.com[abc]\">http://example.com[abc]</a>]

digitalmoksha avatar Apr 26 '24 21:04 digitalmoksha

re: the first item

It looks like by the time we start trying to detect the autolink, the data has already been unencoded, meaning it's <<<http://example.com/>>> - they are no longer html entities. Not sure what, if anything, can be done about that.

My head officially hurts... 🤕

digitalmoksha avatar Apr 26 '24 22:04 digitalmoksha

What lead me to this is that I'm trying to get rid of a custom auto_link filter that mimics what Rinku does. These are the two tests that are failing.

I may decide it's good enough to switch - I think these really are edge cases that I'm not sure how often we see in the wild.

digitalmoksha avatar Apr 26 '24 22:04 digitalmoksha

Yes, indeed; Rinku is some preeeetty antique software by this stage (with no commit from the primary author since 2016, and none from the other maintainer (me!) since 2019), and I imagine the remaining users are pretty far and few between; certainly not at GitHub since the cmark-gfm switch happened, as its own autolink was used from then, which is what Comrak aims to emulate.

Ideally we continue to match cmark-gfm in regular mode — I don't mind what the behaviour is once relaxed-autolinks is specified. Let me know if you want a hand with the former.

kivikakk avatar Apr 28 '24 09:04 kivikakk

Rinku is some preeeetty antique software by this stage

oh yes, very much 😄

Ideally we continue to match cmark-gfm in regular mode

totally agree. Created https://github.com/kivikakk/comrak/pull/386 to address this.

digitalmoksha avatar Apr 28 '24 18:04 digitalmoksha

Alright! So we have the second item addressed by #386 — thanks very much — which leaves us with this unpleasantness:

$ echo 'See &lt;&lt;&lt;http://example.com/&gt;&gt;&gt;' | comrak -e autolink
<p>See &lt;&lt;&lt;<a href="http://example.com/%3E%3E%3E">http://example.com/&gt;&gt;&gt;</a></p>
$ echo 'See &lt;&lt;&lt;http://example.com/&gt;&gt;&gt;' | ~/g/archive/cmark-gfm/build/src/cmark-gfm -e autolink
<p>See &lt;&lt;&lt;<a href="http://example.com/">http://example.com/</a>&gt;&gt;&gt;</p>

I might have a look into this in the next couple of days!

kivikakk avatar Apr 29 '24 15:04 kivikakk

I might have a look into this in the next couple of days!

Turned into a couple of months, but I got there!

kivikakk avatar Jul 10 '24 14:07 kivikakk