commonmark-java
commonmark-java copied to clipboard
DocumentParser drops whitespaces at the beginning of a line
When using DocumentParser
with no enabled block types, the Text
nodes do not include whitespaces at the beginning of a line.
For example, when we use a parser with no enabled block types and the input is:
- text 1
- text 2
the expected result is a document containing two Text
nodes:
- Text("- text 1")
- Text(" - text 2")
but the second Text
is "- text 2" (without preceding whitespaces)
To better illustrate here is a sample test that fails:
public class ParserTest {
...
@Test
public void noBlockTypes() {
String given = "- text 1\n - text 2";
Parser parser = Parser.builder().enabledBlockTypes(Collections.<Class<? extends Block>>emptySet()).build();
Node document = parser.parse(given);
Node child = document.getFirstChild();
assertThat(child, instanceOf(Paragraph.class));
child = child.getFirstChild();
assertThat(child, instanceOf(Text.class));
assertEquals("- text 1", ((Text) child).getLiteral());
child = child.getNext();
assertThat(child, instanceOf(SoftLineBreak.class));
child = child.getNext();
assertThat(child, instanceOf(Text.class));
assertEquals(" - text 2", ((Text) child).getLiteral());
}
}
The reason for this is the paragraph parser. The spec says that leading whitespace is skipped: https://spec.commonmark.org/0.31.2/#example-222
Not sure how we would handle it. We can't add the leading whitespace to the literal of Text
nodes as that would change rendering for existing code, but maybe we could add it as another attribute.
Note that you should be able to work around this limitation by checking the source spans of the text (see includeSourceSpans
on Parser.Builder
).
See also https://github.com/commonmark/commonmark-java/pull/290#issuecomment-1986613844