Migrate `markdown-gfm` to `remark` and `rehype`
Suggested merge commit message (convention)
Other (markdown-gfm): Migrate to remark / rehype packages. Closes #18684.
MINOR BREAKING CHANGE (markdown-gfm): Migrate from marked and turndown to remark and rehype.
MINOR BREAKING CHANGE (markdown-gfm): Enable autolinking in Markdown (works only when loading Markdown content into the editor).
Things to prioritize when testing this PR
The following areas should be prioritized when testing this PR, as they previously had (or have now) custom handling:
- Lists – both those loaded into the editor and those created within the editor (especially to-do lists).
- Links – regular links and plain text links like
www.example.comthat are converted into links when loaded into the editor. - Code blocks
- HTML embedded within the Markdown
- The
keepHtmlfunction (which appears to lack documentation aside from the API reference) - Loading Markdown generated by the older version of the integration into the new one
A few thoughts on the migration
The remark and rehype combo works more predictably – likely because they come from the same ecosystem – compared to marked and turndown, which are developed independently. This is also reflected in the tests, which needed to be updated to pass.
One benefit of the older setup was its smaller size. Migrating from marked and turndown to remark and rehype increases the build size by 39.95 KiB gzipped.
The unifiedjs ecosystem (which includes remark and rehype) is also much more modular. The old setup required “only” 5 direct external dependencies, while the new one requires 14. However, the following plugins are foundational and are already dependencies of the larger ones:
-
hast -
hast-util-from-dom -
hast-util-to-html -
hast-util-to-mdast -
unist-util-visit
Challenges
There were two major challenges when implementing this change.
Autolinking
The first was autolinking. The original implementation had autolinking (automatically turning text links like http://example.com into <http://example.com>) disabled. While this wasn't compliant with the GFM spec, there may have been reasons for it. However, we didn’t treat text links like regular text either, because we explicitly disabled escaping of them. This meant they were neither a link nor plain text.
As a result, Markdown returned by the editor could include unescaped text links, which would later be turned into links by other Markdown parsers. We decided to enable autolinking so text links are turned into regular links when loaded in the editor. Text links typed directly into the editor will not be turned into links, but instead escaped like regular text –providing a better WYSIWYG experience.
// Markdown => HTML
[www.example.com](www.example.com) => <a href="www.example.com">www.example.com</a>
www.example.com => <a href="www.example.com">www.example.com</a>
// HTML => Markdown
<a href="www.example.com">www.example.com</a> => [www.example.com](www.example.com)
www.example.com => www\.example.com
Bundle size
Another issue we anticipated was bundle size. Typically, Markdown and HTML parsing happens on the server, so most packages are built with that environment in mind. However, unlike the browser, server environments don't have built-in DOM parsers and instead use packages like parse5. While parse5 is excellent, it's completely unnecessary in a browser environment, which already has a built-in DOM parser.
The rehype-dom-parse and rehype-dom-stringify packages are DOM-based alternatives to rehype-parse and rehype-stringify, and they resolved most of the related issues. One issue they didn’t solve was parsing HTML inside Markdown. Unfortunately, there doesn’t appear to be a first-party solution for this, which necessitates writing a small custom plugin.
You can find more information here: https://github.com/orgs/rehypejs/discussions/202. While this discussion hadn’t received an answer at the time of writing, it provides useful context for the issue, so I’m leaving the link for “future us”.
~~PR is still WIP, I just need CI to run the tests~~
EDIT: PR is ready for review.
I assume that
LICENSE.mdfile must be updated also.
Done :heavy_check_mark:
Markdown output doesn't contain nested to-do list items
Steps
- Open markdown.html
- Insert a to-do list with nested items
Result
Nested items are missing from the output:
Markdown output doesn't contain nested to-do list items
Fixed
Found one case that does not work the same way as previously: adding a to-do list with code block.
Nice catch @psmyrek. This is an upstream bug that I reported here. However, this doesn't work in the current implementation either, so I don't think it's a blocker. I added a test with proper context in the comment and mentioned this case in a list of known issues.
New lines in tables are missing from the output
Steps
- Open markdown.html
- Insert a table
- Add some text inside a table cell
- Press enter and add some text in a new line
Result
Newline is missing from the output:
New lines in tables are missing from the output
Good catch. This didn't work in the old implementation too, so I don't think it's a blocker. We will discuss whether to fix this now or as a follow-up.
PR is ready for technical re-review.
@filipsobol We've finished testing the changes with @juliaflejterska and apart from reported issues the changes look good to us 👌