flexmark-java
flexmark-java copied to clipboard
Html to Markdown converstion leaves <br/> elements after <h1> elements unconverted.
When converting html containing a heading (any <h#> element) followed by an html linebreak (<br/>) to markdown using the basic html to markdown converter results in markdown that still has a <br /> tag in it.
- [x]
FlexmarkHtmlParser - [x] extension(s): html2md
To Reproduce
import com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter;
public class BrAfterHeadingSample {
public static void main(String[] args) {
final FlexmarkHtmlConverter converter = FlexmarkHtmlConverter.builder().build();
final String html = "<h1>a</h1><br/>";
final String md = converter.convert(html);
System.out.println("Original HTML:\n" + html);
System.out.println("");
System.out.println("Rendered markdown:\n" + md);
}
}
Output
Original HTML:
<h1>a</h1><br/>
Rendered markdown:
a
===
<br />
my current workaround is to remove all <br /> after conversion.
public static String convertHtmlToMarkdown(String html) {
if (html == null) {
return null;
}
final String mdWithRogueBrs = HTML_TO_MD_CONVERTER.convert(html);
// Remove replace when this is addressed in flexmark: https://github.com/vsch/flexmark-java/issues/446
return StringUtils.replace(mdWithRogueBrs, "<br />", "");
}