commonmark-java icon indicating copy to clipboard operation
commonmark-java copied to clipboard

Preserving empty line nodes

Open ijioio opened this issue 1 year ago • 6 comments

I am using commonmark library to generate emails. It is a really nice and easy to use library. It have both HtmlRenderer for rich email body, and TextContentRenderer for plain text email body. Looks like perfect solution. The person who sends an emails, prepares it using markdown syntax, then content is processed by both parsers and embedded to the email as plain and html body. But the problem is that parser discards line breaks (or empty lines).

Example:

Parser parser = Parser.builder().build();

Node document = parser.parse("foo\n\nbar\n\nbaz");

HtmlRenderer htmlRenderer = HtmlRenderer.builder().build();
TextContentRenderer textRenderer = TextContentRenderer.builder().build();

htmlRenderer.render(document);
textRenderer.render(document);

will result in for html

<p>foo</p>
<p>bar</p>
<p>baz</p>

for plain text:

foo
bar
baz

Parser just discards the empty lines completely. While for html this is acceptable cause each line will be wrapped with paragraph, for plain text renderer it is critical as the text looses its formatting.

What I expect is that parser will handle empty lines as a separate nodes, but renderers can process them differently, html renderer can just skip it, while plain text renderer will append them to the ouptut:

foo

bar

baz

I was not dig too deep in the sources, so not sure if there is some possible solution for that at the current state...

ijioio avatar Jul 29 '22 05:07 ijioio

Hi @ijioio. As far as I understand, you're just unhappy with what the TextContentRenderer does with paragraphs, right? If it rendered it as follows:

foo

bar

baz

You would be happy?

I'm honestly not sure why it renders it this way at the moment:

foo
bar
baz

@JinneeJ do you remember what the reason was for this? We might have to make this configurable for backwards compat reasons, but it would still be good to know why.

robinst avatar Oct 18 '22 06:10 robinst

@ijioio Also note, you can already change this behavior with existing API via TextContentRenderer.builder().nodeRendererFactory(myFactory) and overriding the rendering of Paragraph nodes.

robinst avatar Oct 18 '22 06:10 robinst

@JinneeJ do you remember what the reason was for this? We might have to make this configurable for backwards compat reasons, but it would still be good to know why.

@robinst I don't remember exactly 😅 I guess it didn't occur to me that we might need adding extra empty lines for paragraphs. But anyway having backward compatibility would be great.

JinneeJ avatar Oct 24 '22 14:10 JinneeJ

Hi @ijioio. As far as I understand, you're just unhappy with what the TextContentRenderer does with paragraphs, right? If it rendered it as follows:

foo

bar

baz

You would be happy?

Yes, exactly! I just want plain emails still keep some logical structure of the original text.

ijioio avatar Jan 15 '23 05:01 ijioio

@ijioio Also note, you can already change this behavior with existing API via TextContentRenderer.builder().nodeRendererFactory(myFactory) and overriding the rendering of Paragraph nodes.

Thank you for the suggestion. Not sure is it a universal solution though. I mean, that I will never know in advance in which commonmark syntax block these new line breaks will be used. Is that implies I need to override all possible cases?

ijioio avatar Jan 15 '23 05:01 ijioio

+1 for the original feature request of actually preserving empty lines. As a specific use case, I'm using commonmark to build a sort of rich text editor using markdown as the data, which requires maintaining a mapping between the literal and rendered text. So, e.g., in

An _italic_ text.

when the user places the cursor at index 5 in the rendered text, this actually maps to index 6 in the markup text, to account for the first underscore.

Unfortunately this breaks with newlines since they are simply discarded, so I have no way of knowing how many newlines the user entered.

mattrob33 avatar Feb 07 '23 14:02 mattrob33