lexical icon indicating copy to clipboard operation
lexical copied to clipboard

Bug: Markdown export produces invalid markdown for overlapping formats

Open robfig opened this issue 2 years ago • 2 comments

Steps To Reproduce

  1. In Lexical Playground, enter a string with overlapping formats, such as "helloworld!" formatted in normal, bold, bold+italic, italic, normal as shown here:
 root
  └ (53) paragraph  
    ├ (48) text  "he"
    ├ (49) text  "llo" { format: bold }
    ├ (50) text  "wor" { format: bold, italic }
    ├ (51) text  "ld" { format: italic }
>   └ (52) text  "!"
  1. Click Export to markdown.

Actual: he**llo*wor**ld*! Expected: he**llo**___wor___*ld*!, or any other valid encoding according to CommonMark

Copy those strings here to see the difference https://spec.commonmark.org/dingus/

I'm not sure if any markdown parser supports overlapping formats as they are produced by the Markdown export. CommonMark says they are not valid:

When two potential emphasis or strong emphasis spans overlap, so that the second begins before the first ends and ends after the first ends, the first takes precedence. Thus, for example, *foo _bar* baz_ is parsed as <em>foo _bar</em> baz_ rather than *foo <em>bar* baz</em>.

Perhaps Markdown export should produce markdown for each TextNode at a time, or each set of consecutive TextNodes that share a common format, instead of treating the individual formats separately.

robfig avatar Aug 16 '23 12:08 robfig

Still reproducible in 0.22.0

AlessioGr avatar Dec 31 '24 21:12 AlessioGr

Given the following markdown produced by the editor

*Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. **It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.***

Which should result in the following Image

It instead produces the following when imported Image

Workaround that worked for us:

Instead of importing all the transformers (TRANSFORMERS) from @lexical/markdown we simply imported the ones we needed, and made sure to distinguish the bold and italic ones either using a different method of markdown.

Notice the difference for bold and italic

BOLD_STAR,
ITALIC_UNDERSCORE,
BOLD_ITALIC_UNDERSCORE,
HEADING,
ORDERED_LIST,
UNORDERED_LIST,

This produces the following markdown

_Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. **It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.**_

Which is correctly interpreted by the importer and produces the desired result.

bekworks avatar Feb 05 '25 12:02 bekworks