Bug: Markdown export produces invalid markdown for overlapping formats
Steps To Reproduce
- In Lexical Playground, enter a string with overlapping formats, such as "helloworld!" formatted in normal, bold, bold+italic, italic, normal as shown here:
root
└ (53) paragraph
├ (48) text "he"
├ (49) text "llo" { format: bold }
├ (50) text "wor" { format: bold, italic }
├ (51) text "ld" { format: italic }
> └ (52) text "!"
- Click Export to markdown.
Actual: he**llo*wor**ld*!
Expected: he**llo**___wor___*ld*!, or any other valid encoding according to CommonMark
Copy those strings here to see the difference https://spec.commonmark.org/dingus/
I'm not sure if any markdown parser supports overlapping formats as they are produced by the Markdown export. CommonMark says they are not valid:
When two potential emphasis or strong emphasis spans overlap, so that the second begins before the first ends and ends after the first ends, the first takes precedence. Thus, for example,
*foo _bar* baz_is parsed as<em>foo _bar</em> baz_rather than*foo <em>bar* baz</em>.
Perhaps Markdown export should produce markdown for each TextNode at a time, or each set of consecutive TextNodes that share a common format, instead of treating the individual formats separately.
Still reproducible in 0.22.0
Given the following markdown produced by the editor
*Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. **It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.***
Which should result in the following
It instead produces the following when imported
Workaround that worked for us:
Instead of importing all the transformers (TRANSFORMERS) from @lexical/markdown we simply imported the ones we needed, and made sure to distinguish the bold and italic ones either using a different method of markdown.
Notice the difference for bold and italic
BOLD_STAR,
ITALIC_UNDERSCORE,
BOLD_ITALIC_UNDERSCORE,
HEADING,
ORDERED_LIST,
UNORDERED_LIST,
This produces the following markdown
_Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. **It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.**_
Which is correctly interpreted by the importer and produces the desired result.