galaxy-hub icon indicating copy to clipboard operation
galaxy-hub copied to clipboard

Characters in hyperlinks getting escaped

Open NickSto opened this issue 3 years ago • 1 comments

The Problem

Our current version of remark is breaking links which include characters like & and _ by trying to escape them with a preceding backslash.

Solutions

Currently, the only workaround is to use HTML for your hyperlinks instead of Markdown, which is super suboptimal.

The best solution would be to update remark, but it's Gridsome's dependency, so at the moment we're stuck with this version.

Another not-great workaround is to use Javascript to try to recognize these broken links and rewrite them.

Examples

Note: The original issue this came up in is #800, but I wanted to make this as a more general issue not just about that instance.

NickSto avatar Oct 27 '21 21:10 NickSto

Update

It looks like it's a combination of two factors:

  1. The preprocessing step inserts the backslashes.
  2. Gridsome incorrectly preserves the backslashes in the resulting HTML.

Preprocessing

mdfixer.mjs parses each Markdown file to fix several content issues, then serializes the Markdown into the build/ directory. It uses remark to do this, which escapes characters like &, even in hyperlink paths.

However, this isn't technically a bug because Markdown is supposed to allow backslashes like that (according to the CommonMark standard followed by remark). They're supposed to be removed when rendering into HTML. You can see for yourself.

Gridsome's rendering

For some reason, Gridsome's Markdown processors don't remove this backslash. I thought it uses remark-html, but even with the old version Gridsome uses (8.0.0), remark-html removes this backslash. I'd have to dig back into its plugins to find out why.

Solutions

Even if we can't fix the root cause because it's due to some old package that Gridsome hasn't updated, there are other ..darker possibilities.

In the preprocessing step, I could serialize the Markdown tree back to a string, then parse it again and use the position data in the tree to basically do some surgery on the serialized Markdown string: extract the urls, fix them, then patch them back into the string. I actually already do some of this in fix-links.mjs (see fixHtmlLinks() and editProperty()).

NickSto avatar Oct 28 '21 19:10 NickSto