flexmark-java icon indicating copy to clipboard operation
flexmark-java copied to clipboard

Bold and italic html tags with space are not converted properly to markdown

Open Gustl22 opened this issue 7 months ago • 1 comments

Describe the bug A clear and concise description of what the bug is.

Bold and italic html text, which contain a whitespace are not properly converted to valid markdown.

https://spec.commonmark.org/dingus/?text=%3Cstrong%3EMy%20text.%20%3C%2Fstrong%3Eappend%0A%0AMy%20text.append%0A%0AappendMy%20text.%0A%0A%3Ci%3EMy%20text.%20%3C%2Fi%3Eappend%0A%0A_My%20text._append

Please provide as much information about where the bug is located or what you were using:

  • [ ] Parser
  • [ ] HtmlRenderer
  • [ ] Formatter
  • [x] FlexmarkHtmlParser
  • [ ] DocxRenderer
  • [ ] PdfConverterExtension
  • [ ] extension(s)

To Reproduce

FlexmarkHtmlConverter converter = FlexmarkHtmlConverter.builder().build();

String html = "<p><strong>My strong text. </strong>append</p>";
String converted = converter.convert(html);

This returns

**My strong text.**append

It should return

**My strong text.** append

Another example: Prepend<strong>. some</strong> is parsed to Prepend**. some**, while it should probably be parsed to Prepend. **some**

Therefore any markdown parser can't interpret the bold style. Same for italic

Gustl22 avatar May 28 '25 10:05 Gustl22

More test examples:

<p><em>Translated from German/</em>mga</p>
<p><em>Translated from German.</em>mga</p>
<p><strong>A word-</strong><strong>to</strong><strong>-one two-three </strong><strong>four </strong><strong>five six  </strong><strong>seven</strong><strong>eight?&nbsp; </strong><strong></strong></p>

Gustl22 avatar May 28 '25 17:05 Gustl22