flexmark-java icon indicating copy to clipboard operation
flexmark-java copied to clipboard

Html to Markdown converstion leaves <br/> elements after <h1> elements unconverted.

Open mandragorn opened this issue 4 years ago • 1 comments

When converting html containing a heading (any <h#> element) followed by an html linebreak (<br/>) to markdown using the basic html to markdown converter results in markdown that still has a <br /> tag in it.

  • [x] FlexmarkHtmlParser
  • [x] extension(s): html2md

To Reproduce

import com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter;

public class BrAfterHeadingSample {
  public static void main(String[] args) {
    final FlexmarkHtmlConverter converter = FlexmarkHtmlConverter.builder().build();
    final String html = "<h1>a</h1><br/>";
    final String md = converter.convert(html);
    System.out.println("Original HTML:\n" + html);
    System.out.println("");
    System.out.println("Rendered markdown:\n" + md);
  }
}

Output

Original HTML:
<h1>a</h1><br/>

Rendered markdown:
a
===

<br />

mandragorn avatar Mar 22 '21 15:03 mandragorn

my current workaround is to remove all <br /> after conversion.

  public static String convertHtmlToMarkdown(String html) {
    if (html == null) {
      return null;
    }

    final String mdWithRogueBrs = HTML_TO_MD_CONVERTER.convert(html);
    // Remove replace when this is addressed in flexmark: https://github.com/vsch/flexmark-java/issues/446
    return StringUtils.replace(mdWithRogueBrs, "<br />", "");
  }

mandragorn avatar Mar 24 '21 01:03 mandragorn