micromark-extension-directive
micromark-extension-directive copied to clipboard
Trailing whitespace in labels is elided
Initial checklist
- [X] I read the support docs
- [X] I read the contributing guide
- [X] I agree to follow the code of conduct
- [X] I searched issues and couldn’t find anything (or linked relevant results below)
Affected packages and versions
[email protected] [email protected] [email protected]
Link to runnable example
https://astexplorer.net/#/gist/b3ff9dc85d8e49ef94791c73e645646f/21c9658bf3272b1922fb07d889351d8482c1c552
Steps to reproduce
In this AST Explorer demo, observe the parse tree for :redact[secret ]word.
(The same issue reproduces on my local machine using node@20, bun, macOS 14.1.2, and no build tools).
Expected behavior
I expect the textDirective node's child text node to have value secret (with the trailing space).
Actual behavior
The textDirective node's child text node has value secret (without the trailing space). The trailing space isn't represented anywhere else in the parse tree.
Runtime
Other (please specify in steps to reproduce)
Package manager
Other (please specify in steps to reproduce)
OS
Other (please specify in steps to reproduce)
Build and bundle tools
Other (please specify in steps to reproduce)
I spent some time digging into this and trying to produce a fix. As far as I can tell, what's going on is:
- Directive label contents are treated as a nested tokenization task by emitting a
chunkTexttoken surrounding their interior. micromark:subtokenizeeventually transformschunkTexttoken into adatatoken which still contains the trailing whitespace.- The final step of the subtokenization process calls
micromark:resolveAllLineSuffixes, which splits the trailing whitespace out of thedatatoken into alineSuffixtoken. mdast-util-from-markdowncompiles the contents of thedatatoken into thetextnode's value, but ignores thelineSuffixcontents.
Assuming you consider this behavior to be a bug, I'm not sure whether you'd consider the defect to be in the micromark behavior (this nested content is not, in fact, at the end of a line) or in the compiler behavior (perhaps the lineSuffix should be emitted in this nested context?).
PS: Thank you for all your hard work!
Thanks for the investigation and your kind words!
resolveAllLineSuffixes exists because trailing spaces on a line are not “emitted”/“rendered” a -> <p>a</p>. This seems like a bug there as in :x[y ]z , the space after y should not be a “line suffix” but the one after z should be.
I’m not 100% this is a bug. Given a -> <p>a</p>, # b -> <h1>b</h1>, # c # -> <h1>c</h1>. Why would this here be different?
Why do you want trailing whitespace?
Thank you for digging in! It’s a fair question, and I don’t think it’s totally obvious that trailing whitespace should be preserved.
I’m hoping to use the inline text directive to produce behavior somewhat like other inline text directives—links, inline code, emphasis. Of those, the first two preserve trailing whitespace; the last does not, AFAICT to avoid spurious parse situations. In my mind, these directive labels are very congruent to link labels. Perhaps leaf and container directive labels are less obviously analogous to link labels than the text directive labels.
More concretely, I’m experimenting with text directives to create a “redact” markup for a flashcard system, and I imagined an interface where one can drag the handles of the redaction left to right across the text, character by character. It’s freeing to be able to drop the handle wherever, and feels weird if it “jumps” when I release it because the underlying representation can’t place the right edge in certain positions. I can make this work without actual syntactic support for the trailing space scenario, but it felt unintentional (given the line suffix token when it’s not actually at the end of a line) so I thought I’d write a bug.
Thanks for considering!
Links and emphasis are the same.
*b
c*
[b
c](#)
https://spec.commonmark.org/dingus/?text=%20b%20%0Ac%20%0A%0A%20%5Bb%20%0Ac%5D(%23)%20%0A
The thing with them though, is that they are parsed as separate things: *, *, [, ](). Everything goes from left to right.
So it’s the paragraph/heading parent, the content type (text), that deals with the trailing whitespace in the entire thing.
With content in the [ and ] of directives, it’s parsed separately. It’s as if it was its own paragraph or heading. Because it could be! That’s how directives work (also the leaf / container). You currently choose to use the content inside a paragraph (which I get). But it could be say a separate tooltip. It could be nice for folks to be able to pad with whitespace
But there are two things here: a) initial/trailing when looking at the whole, b) initial/trailing when looking at a line ending.
I assume you don’t see a reason for “keeping” the initial/trailing whitespace for :x[y \n z]a.
And that it doesn’t matter for leaf/container, as in, ::x[ Yyy zzz. ].
So if this would be implemented, it should be a) only for text directives, b) not affect whitespace around line endings.
Note: you can use a character reference btw: :x[y ]z
But there are two things here: a) initial/trailing when looking at the whole, b) initial/trailing when looking at a line ending.
I assume you don’t see a reason for “keeping” the initial/trailing whitespace for :x[y \n z]a.
Ah, great point. No, in a document like this, you're right that I would expect the leading/trailing line whitespace to be stripped:
:x[y
z]a
And that it doesn’t matter for leaf/container, as in, ::x[ Yyy zzz. ].
Right. I agree with your argument that stripping whitespace here matches the behavior of other flow-level nodes.
So if this would be implemented, it should be a) only for text directives, b) not affect whitespace around line endings.
Right. I guess I'd expect the lineSuffix resolution behavior when the whitespace in question is in fact a line suffix. (And likewise for prefixes)
Note: you can use a character reference btw: :x[y ]z
Thanks!
lineSuffix
Without a final end-of-file end-of-line, it’s still the end of the line (a vs a \n). As this whole thing is parsed separately, it’s the start of the thing and end of the thing, even through there’s no \n. But these are internals, the terms don’t mean much.
I remain unsure whichever is better. Current state or proposed state. I can see arguments for both.
Fair enough! :) Thanks for your consideration.
One more thing I wanted to mention. You mention "Links and emphasis are the same."—and in your example, they are. I'm sure you know this, but I wanted to clarify that I was referring to the behavior of links when their label doesn't involve a line ending; i.e. [a ](#) does parse to a link containing a text node with value a . It's in this sense that I was hoping :redact[a ] would behave. Likewise for `a `. Whereas *a * doesn't parse to an emphasis at all, because of the flanking rules.