FlexmarkHtmlConverter: blockquotes can be used to generate unbounded memory allocation
Describe the bug
I was considering using Flexmark as a HTML => text/plain engine for Apache James
(We currently rely on an homegrown Jsoup based parser)
I did throw our test suite at flexmark-html2md-converter and triggered an OutOfMemory error after 18 seconds at the given code:
@Test
public void boom() {
String html = ("<blockquote>" +
"<p>a</p>".repeat(800))
.repeat(400) + "</blockquote>".repeat(400);
String plainText = FlexmarkHtmlConverter.builder()
.build().convert(html);
}
Will throw an OOM
This is because:
- The input increases in
O(N)with the blockquote nesting level - The output increases in
O(N2)with the blockquote nesting level (for each paragraph N previous blockquotes is applied
Same code with different parameters:
String html = ("<blockquote>".repeat(420) +
"a<br/>".repeat(400 * 420))
+ "</blockquote>".repeat(420);
Generates 1MB of input and 142 MB output.
Those are well in ranges I do encounter in emails.
Is there a way to limit memory that could limit allocated memory (IE size of the output) and just throw when this is exceeded as a defense mechanism?
This would prevent me from DOS attacks though unbounded memory allocation and be a condition for adoption.
Similar amplification also exists with lists.
EG:
@Test
public void boom() {
String html = ("<ul>" + "<li>a</li>".repeat(400)).repeat(420)
+ "</ul>".repeat(420);
System.out.println(html.length() + " bytes");
String plainText = FlexmarkHtmlConverter.builder()
.build().convert(html);
System.out.println(plainText.length() + " bytes");
}
=>
1683780 bytes
71064000 bytes