flexmark-java icon indicating copy to clipboard operation
flexmark-java copied to clipboard

FlexmarkHtmlConverter: blockquotes can be used to generate unbounded memory allocation

Open chibenwa opened this issue 1 year ago • 1 comments

Describe the bug

I was considering using Flexmark as a HTML => text/plain engine for Apache James

(We currently rely on an homegrown Jsoup based parser)

I did throw our test suite at flexmark-html2md-converter and triggered an OutOfMemory error after 18 seconds at the given code:

    @Test
    public void boom() {
        String html = ("<blockquote>" +
            "<p>a</p>".repeat(800))
            .repeat(400) + "</blockquote>".repeat(400);

        String plainText = FlexmarkHtmlConverter.builder()
            .build().convert(html);
    }

Will throw an OOM

This is because:

  • The input increases in O(N) with the blockquote nesting level
  • The output increases in O(N2) with the blockquote nesting level (for each paragraph N previous blockquotes is applied

Same code with different parameters:

        String html = ("<blockquote>".repeat(420) +
            "a<br/>".repeat(400 * 420))
             + "</blockquote>".repeat(420);

Generates 1MB of input and 142 MB output.

Those are well in ranges I do encounter in emails.

Is there a way to limit memory that could limit allocated memory (IE size of the output) and just throw when this is exceeded as a defense mechanism?

This would prevent me from DOS attacks though unbounded memory allocation and be a condition for adoption.

chibenwa avatar Aug 26 '24 12:08 chibenwa

Similar amplification also exists with lists.

EG:

    @Test
    public void boom() {
        String html = ("<ul>" + "<li>a</li>".repeat(400)).repeat(420)
             + "</ul>".repeat(420);

        System.out.println(html.length() + " bytes");

        String plainText = FlexmarkHtmlConverter.builder()
            .build().convert(html);

        System.out.println(plainText.length() + " bytes");
    }

=>

1683780 bytes
71064000 bytes

chibenwa avatar Aug 26 '24 12:08 chibenwa