galaxy-hub
galaxy-hub copied to clipboard
Characters in hyperlinks getting escaped
The Problem
Our current version of remark is breaking links which include characters like &
and _
by trying to escape them with a preceding backslash.
Solutions
Currently, the only workaround is to use HTML for your hyperlinks instead of Markdown, which is super suboptimal.
The best solution would be to update remark, but it's Gridsome's dependency, so at the moment we're stuck with this version.
Another not-great workaround is to use Javascript to try to recognize these broken links and rewrite them.
Examples
-
/events/2021-09-papercuts/
originally had a link to timeanddate.com that got completely munged. -
/news/2021-05-galaxy-update/
has (as of 54d4600e13) a doi link that's now broken: https://doi.org/10.1007/978-1-0716-1307-8\_20
Actually it seems that's a little different: it only happens if it's a bare url in the Markdown, not an actual, explicit Markdown link. Once it's a Markdown link, the backslash disappears. But this doesn't seem to happen with ampersands.
Note: The original issue this came up in is #800, but I wanted to make this as a more general issue not just about that instance.
Update
It looks like it's a combination of two factors:
- The preprocessing step inserts the backslashes.
- Gridsome incorrectly preserves the backslashes in the resulting HTML.
Preprocessing
mdfixer.mjs
parses each Markdown file to fix several content issues, then serializes the Markdown into the build/
directory. It uses remark to do this, which escapes characters like &
, even in hyperlink paths.
However, this isn't technically a bug because Markdown is supposed to allow backslashes like that (according to the CommonMark standard followed by remark). They're supposed to be removed when rendering into HTML. You can see for yourself.
Gridsome's rendering
For some reason, Gridsome's Markdown processors don't remove this backslash. I thought it uses remark-html, but even with the old version Gridsome uses (8.0.0), remark-html removes this backslash. I'd have to dig back into its plugins to find out why.
Solutions
Even if we can't fix the root cause because it's due to some old package that Gridsome hasn't updated, there are other ..darker possibilities.
In the preprocessing step, I could serialize the Markdown tree back to a string, then parse it again and use the position
data in the tree to basically do some surgery on the serialized Markdown string: extract the urls, fix them, then patch them back into the string. I actually already do some of this in fix-links.mjs
(see fixHtmlLinks()
and editProperty()
).