commonmark-java icon indicating copy to clipboard operation
commonmark-java copied to clipboard

DocumentParser drops whitespaces at the beginning of a line

Open panpeter opened this issue 8 months ago • 2 comments

When using DocumentParser with no enabled block types, the Text nodes do not include whitespaces at the beginning of a line.

For example, when we use a parser with no enabled block types and the input is:

- text 1
  - text 2

the expected result is a document containing two Text nodes:

  • Text("- text 1")
  • Text(" - text 2")

but the second Text is "- text 2" (without preceding whitespaces)

To better illustrate here is a sample test that fails:

public class ParserTest {
    
    ...
    
    @Test
    public void noBlockTypes() {
        String given = "- text 1\n  - text 2";
        Parser parser = Parser.builder().enabledBlockTypes(Collections.<Class<? extends Block>>emptySet()).build();
        Node document = parser.parse(given);

        Node child = document.getFirstChild();
        assertThat(child, instanceOf(Paragraph.class));

        child = child.getFirstChild();
        assertThat(child, instanceOf(Text.class));
        assertEquals("- text 1", ((Text) child).getLiteral());

        child = child.getNext();
        assertThat(child, instanceOf(SoftLineBreak.class));

        child = child.getNext();
        assertThat(child, instanceOf(Text.class));
        assertEquals("  - text 2", ((Text) child).getLiteral());
    }
}

panpeter avatar Oct 20 '23 08:10 panpeter

The reason for this is the paragraph parser. The spec says that leading whitespace is skipped: https://spec.commonmark.org/0.31.2/#example-222

Not sure how we would handle it. We can't add the leading whitespace to the literal of Text nodes as that would change rendering for existing code, but maybe we could add it as another attribute.

Note that you should be able to work around this limitation by checking the source spans of the text (see includeSourceSpans on Parser.Builder).

robinst avatar Feb 08 '24 12:02 robinst

See also https://github.com/commonmark/commonmark-java/pull/290#issuecomment-1986613844

robinst avatar Mar 09 '24 00:03 robinst