Too much blankspace gets stripped
Coming from https://github.com/forem/forem/issues/8457, I think that ReverseMarkdown strips too much blankspace in some scenarios. Here's a failing test:
it 'keeps whitespace surrounding links' do
result = ReverseMarkdown.convert("a\n<a href='1'>link</a>\nis good\nbut blankspace is better")
expect(result).to eq "a [link](1) is good but blankspace is better\n\n"
end
The output is a[link](1)is good but blankspace is better\n\n. This happens because the text converter calls remove_border_newlines, and the fact that the middle line is a link means that it will be its own nokogiri node, and the three nodes will be joined with no whitespace. I tried changing remove_border_lines to squeeze instead of removing everything, but this doesn't work: it keeps whitespace in scenarios where it shouldn't:
first<p>second</p>third becomes first\n\nsecond\n\n third\n\n.
I still want to investigate this further, but I decided to post this now to share my findings.
I was curious as to why this didn't happen with <strong> tags. The following test does succeed:
it 'keeps whitespace surrounding bolds' do
result = ReverseMarkdown.convert("a\n<strong>potato</strong>\nis good\nbut blankspace is better")
expect(result).to eq "a **potato** is good but blankspace is better\n\n"
end
After some puts debuggering, I see that this one also gets its blankspace trimmed in the first pass, but then cleaner.tidy adds it back again in clean_tag_borders:
converters.convert: "\n\na**potato**is good but blankspace is better\n\n"
cleaner.tidy: "a **potato** is good but blankspace is better\n\n"
So it seems that we're losing information during convert, and assuming that ** would always be preceeded by a blankspace, which gets added in tidy (clean_tag_borders). The same technique wouldn't work for links, since the surrounding blankspace doesn't need to be there for markdown links.
My previous attempt of changing remove_border_lines to use squeeze doesn't work because it would break this test:
it 'squeezes whitespace when using paragraph tags' do
result = ReverseMarkdown.convert("first paragraph\n<p>second paragraph</p>\nthird\nin separate lines")
expect(result).to eq "first paragraph\n\nsecond paragraph\n\nthird in separate lines\n\n"
end
with the following diff:
first paragraph
second paragraph
-third in separate lines
+ third in separate lines