pulsar [core] `atom.ui.markdown` Refactor

After some time has passed with our new Markdown parsing internal API, there's been one major issue and a few smaller ones, which have prompted me to give this some love before it becomes harder to change (due to wide usage)

The Syntax Highlighting was unclear, and difficult to work with. Previously to syntax highlight, you had to pass objects around three different functions, with special options to ensure it worked, and was otherwise unclear and painful to work with.
The arguments that can be passed didn't make much sense to end users, while they made sense during implementation defaultEmoji doesn't make sense when there's no way to implement a custom emoji handler.
The code wasn't as clean as I'd like it to be, especially if we consider we might expand more on it later
There is no way to avoid having to do the entire MarkdownIT setup over and over again.

This PR aims to solve all of the above issues.

Converting our previous few functions available for use, to the following:

render()
buildRenderer()

Now these should be obvious and clear on what they do. The render function obviously renders Markdown content, meanwhile the buildRenderer function is only used to build a renderer instance.

This time around I've put an emphasis on optionality, where ideally nearly all options can be left out and things will still work perfectly fine.

`buildRenderer`

A completely optional function. This will return a MarkdownIT instance that a package can then store itself, this way it can avoid having to rebuild the markdown instance on every call, if that setup never changes. Such as the notifications package.

`render`

This function will obviously be in charge of rendering content. It takes three parameters:

content: This is the text you want to parse
givenOpts: The arguments you want to pass
mdInstance: A MarkdownIT instance to use for rendering.

But of course, the mdInstance is optional, if one isn't provided your full givenOpts will be passed to the buildRenderer function to build your renderer, then render it.

As you may have guessed, syntax highlighting is much simpler. To enable it all that you need to do is pass { highlight: true } as one of your arguments, and it will be enabled. Although, it likely won't detect the language properly for syntax highlighting (yet), as the highlight: true will enable syntax highlighting, the Markdown API assumes you are providing a function that can resolve a Fenced Code Block Scope to a Pulsar syntax scope, such as js => source.js, so you could also do this { highlight: scopeResolvingFunc } and syntax highlighting will now work, with your custom resolver.

Of course I would love to include a default resolver into this API, but considering it's a bit beefy (look at the one I made for markdown-preview I'm not sure it should be included unless it's majorly simplified, or otherwise markdown-preview relies on this API in full.)

So ideally this can resolve the above issues mentioned, and we can make sure this API is a bit more long lasting.

Parameters

On another note, you may have noticed some option names have changed, such as syntaxScopeNameFunc => highlight these changes ensure to fallback to the old names in every single case, so that if everything works correctly, a consumer wouldn't have to update anything to continue using this API after this PR is merged (unless of course they had previously used the Syntax Highlighting)

Options

Last thing of important note, I've changed some defaults to the options. Notably I've changed the following options: (Where the first value is what it was previously, and the second value is what it is now)

transformImageLinks: true => false
transformNonFqdnLinks: true => false

The reason I've changed these is that manipulation of the content like this feels like something that should really be opt in behavior when it comes to users local files, as it may not always be needed, especially for a community package implementing this API, I'd rather than changes be obvious.

But you'll notice I've left transformAtomLinks as the default true since that feels more helpful to ensure we are always transforming links that will only link to the GitHub blog about Atom's sunset, this transformation only aims to preserve existing functionality.

Jan 26 '24 08:01 confused-Techie

This one is now ready for review!

Jan 30 '24 15:01 confused-Techie

@mauricioszabo and I were discussing the issues of this PR on Discord, but to open up the discussion to more people I'll post it here in hopes to find a resolution everyone's happy with.

Essentially, Mauricio isn't a fan of the changing return value here, which I can totally understand. Since as written the return value of this function is html text most of the time, but if syntax highlighting is enabled, then we get a promise returned, which once resolved is an HTML Fragment.

Mauricio last suggested that we always return an HTML fragment, with an attribute that is a promise if syntax highlighting is applied, which could then be awaited on.

Although personally, I feel that returning a fragment all the time is problematic, since the vast majority of use cases for this API utilize the html text already, and I'd hate to rework more code in other packages here.

Myself, I'd vote to returning a document with the following values:

text: The raw HTML string
html: A document fragment
syntax: A promise that could return the html element once resolved

This way all three possibilities are always available, and in the event someone didn't enable syntax highlighting then the syntax promise would just never resolve.

But please, add any other opinions here if available

Mar 01 '24 02:03 confused-Techie

@confused-Techie I think it's a good compromise - I don't believe returning the text and the fragment would be too costly :)

One change I suggest is that if syntax highlighting is not enabled, the promise is just resolved immediately - so that one doesn't need to check for "if syntaxEnabled, wait for promise" or something like that.

Mar 01 '24 02:03 mauricioszabo

@mauricioszabo Great idea on resolving the promise right away, should the promise still return the html fragment without syntax highlighting?

Mar 01 '24 02:03 confused-Techie

Given the following assumptions:

rendering markdown can always be done synchronously
performing syntax highlighting must always be done asynchronously

I can't help but think they should be separate methods.

That said, if we're OK with the method always being async, then that's an easy way of solving it. If we want to cater to a simpler use case for people who just want to render markdown synchronously, then we could expose a separate method that only does the markdown parsing.

Mar 01 '24 02:03 savetheclocktower

@savetheclocktower Your assumptions are absolutely correct, and that's totally valid that exposing a separate function may be the best solution.

I was hoping to move away from multiple functions as the initial implementation was awkward, but I suppose having just two of render and highlight wouldn't be too unwieldy, and resolves the conversation of proper returns completely.

@mauricioszabo What do you think? Seem like a viable path forward? Since this may be the simplest answer

Mar 01 '24 02:03 confused-Techie

To be clear, I'm fine with one method doing everything as long as it's always async. I don't have a problem with consuming a method that goes async even when it could technically perform the task I want in synchronous fashion. If people wanted a sync version, then we could add another method, but I'm not specifically asking for that.

Mar 01 '24 02:03 savetheclocktower

So my also 2 cents - while syntax highlighting is always async, the actual HTML fragment return is sync - it's just that after the element is present, the UI is updated with the highlighting (I know that because that's what I'm doing with the old API).

So, I suggest the method to always be sync, actually, and the syntax is just a promise that resolves when the highlighting ends. Look at the code:

    return async () => {
      await Promise.all(promises);
      return domHTMLFragment;
    };

We actually don't do anything with the promises, so my suggestion is to return something like:

return {
  syntax: Promise.all(promises).then(_ => true),
  html: domHTMLFragment
}

And that's it - if promises is an empty array, it'll "resolve" immediately so things still work.

EDIT: explaining my reasoning, a consumer of this API could decide to render the element right away or choose to wait for the promises before rendering the element, and both will work just fine.

Mar 01 '24 15:03 mauricioszabo

I'm revisiting this PR because I'm working on a blog post with lots of code examples in it, and markdown-preview is getting slower and slower to re-render at a rate that increases with each new code block I add. This issue is present no matter which parser markdown-preview uses, but it's painful and it needs to be fixed in a way that I don't think the atom.ui implementation can deliver.

Whenever we call render on some Markdown with the appropriate options, we are asking Pulsar to highlight any code blocks it finds. For each code block, that process involves

creating an instance of TextEditor,
making it read-only and doing some other funky DOM stuff to it, and
adding it in place of the pre element that was ordinarily there.

In the case of markdown-preview, this happens whenever the buffer stops changing. We're wise enough not to try to re-render with each keystroke, choosing instead to wait until something like 200ms have elapsed with no new changes, but that doesn't help a lot. In my case, once I got to around 30 code blocks, the editor started freezing for about a second with each re-render.

I dug into it. Not surprisingly, the bulk of the cost revolves around the TextEditor stuff. Markdown is fast to re-render — and even if it weren't, it could be moved off the main thread into a web worker — but creating TextEditors must be done on the main thread.

We are very cavalier about throwing the old results away and creating new ones. Amazingly, though, the bulk of the cost is paid in style recalculations; this call to isVisible can freeze the main thread for as much as half a second at a time because reading offsetWidth or offsetHeight forces the browser to immediately apply a bunch of DOM operations it was saving for later.

There's probably an optimization to be made there. But even within the status quo, this cost wouldn't have to be paid if we could re-use TextEditor instances, and especially if we didn't have to move them around, since that's what triggers their connectedCallbacks and their costly are-we-visible-or-not logic. To make this possible, we should stop doing the wasteful thing we're doing — replacing the entire contents of the element each time the Markdown source changes. We seriously set textContent to an empty string and then dump in a new document fragment every time. We need to minimize the cost we pay to create TextEditor instances, and the first step is to keep content on the page longer.

I did a proof-of-concept with a library called morphdom that was quite successful. It does a virtual-DOM-style operation without the virtual DOM; it compares the existing content to the desired content (in either string or node form) and applies the minimum number of changes to the existing content in order to get them to match. It wasn't hard to configure it to ignore atom-text-editor elements during this process, leaving me with a pretty good flow for re-renders:

Call renderMarkdown and get a document fragment.
Clone the existing container node (without cloning its children) and append the fragment as a child of that node. We do not try to apply syntax highlighting yet; this means we keep the pre elements around for now.
Call morphdom(existingContainer, newContainer, options); this changes the existing container in increments so that its contents are identical to those of newContainer. (The options argument skips consideration of atom-text-editor and does some other stuff.)
After this is done, we can do a highlighting sweep. Any pre element that doesn't have a corresponding atom-text-editor gets one created. We pay the startup cost there, but we then cache the TextEditor instance by using the pre element as the key; since the pre nodes will be preserved wherever possible between renders, this works great for preventing excessive TextEditor creation. If the contents of a pre changed, we change the text in the TextEditor; same for if the language changes. But if a given change to the source didn't affect any code blocks at all, then nothing needs to change about the editors, either.
Instead of removing the pre elements — which would thwart my plan to use them as cache keys — I simply have a style rule that hides them: atom-text-editor + pre { display: none; }. Since the atom-text-editor is inserted just above the pre, this works as if by magic.
Any “orphaned” editors (from, say, a code block getting destroyed) get cleaned up and removed from the DOM.

This is what I spent a couple hours on today just because it bugged the hell out of me, and it got markdown-preview back to a usable state for this one particular Markdown file. This approach works great with the original markdown-preview parser, but I can't port it over to the atom.ui approach because it doesn't understand a lot of markdown-preview's concerns here, and I'm not even sure it ought to. A better idea would be for markdown-preview to handle syntax highlighting itself and not ask atom.ui to do it, since we have no good way to make atom.ui prevent the wasteful creation and destruction of TextEditor instances.

What I'll do instead is opt out of syntax highlighting from atom.ui and manage it within markdown-preview the same way for both Markdown parsers.

This is my extremely long way of saying that code highlighting should not be part of the atom.ui library — particularly if our approach to syntax highlighting is to delegate the task to a TextEditor. It's a footgun. I know we thought that the problem was the weird API, but I think the problem is that we're trying to do something really weird in the first place.

If we offer syntax highlighting as a feature, we need to do it in a much less costly way than this, and within an API that doesn't create and then dispose of so many different TextEditor instances. Let's take it out of the API sooner rather than later — before anyone is really relying on it. The simple API can go in atom.ui and then the advanced version with syntax highlighting can be provided by markdown-preview as a service once we figure out how to do it better.

Apr 21 '24 00:04 savetheclocktower

Well, too late - I'm already relying on it :scream:

But that's fine, I can revert-ish my work.

Apr 21 '24 04:04 mauricioszabo

So, my 2~ish cents on this:

I think a Markdown API with Syntax Highlighting is a good idea - maybe even necessary - for our ui API. Usually, people won't do the same thing as Markdown Preview - they will highlight a single thing, and that's it.

BUT - I think it's also a good idea for us to not reinvent the wheel - like, if we do markdown + Syntax Highlight in a core plug-in, and offer the same thing as an API, that API should be sufficient for the core plug-in. So I vote on reverting this for now, and then thinking about a way to make an API that allows for incremental rendering, if that's possible, and use it on Markdown Preview.

So, @savetheclocktower, if you can share the Markdown you were writing, or at least make an example, maybe we can discuss a way to make it work? What do you think?

Also, thanks for finding this out! Better sooner than later :)

Apr 21 '24 04:04 mauricioszabo

So, @savetheclocktower, if you can share the Markdown you were writing, or at least make an example, maybe we can discuss a way to make it work? What do you think?

Funny you should mention! We can discuss it in #984.

Apr 21 '24 05:04 savetheclocktower

@savetheclocktower I appreciate such a thorough review on this one.

I do want to quickly shift some of the blame of the syntax highlighting to say that the issues experienced do likely already exist in markdown-preview since we have copied their methodology for syntax highlighting to this new API with hardly any change. But due to this API not being very optimized it doesn't surprise me to see it exacerbate the weak points and make usage of the feature even worse.

But you do make a great case towards prioritizing services, rather than core API's for big important features like this, and that's something I can totally get behind if we do think it's the best way forward, so if we think so it's something I'd be happy to take a look at if you haven't already beat me too it.

In the meantime, as for the work on this PR, it sounds like it may be best to go ahead and close this one out, in favour of a different PR that simplifies or even removes it, in favour of a service. I know removal might be worrisome, but considering I think Mauricio is the only one to actually rely on it, I can't imagine it'd have much impact.

Thanks again for such a thorough review

Apr 23 '24 14:04 confused-Techie

I do want to quickly shift some of the blame of the syntax highlighting to say that the issues experienced do likely already exist in markdown-preview since we have copied their methodology for syntax highlighting to this new API with hardly any change.

Yeah, I certainly do wonder why they made some of the choices they did. I changed some of them in #984 and I think it's an improvement.

Apr 23 '24 16:04 savetheclocktower