html-to-markdown icon indicating copy to clipboard operation
html-to-markdown copied to clipboard

🐛 Bug <br> is converted into two new lines (\n\n)

Open prologic opened this issue 4 years ago • 9 comments

Describe the bug

In my testing I've found that the HTML tag <br /> gets turned into two new lines (\n\n);

Example:

(⎈ |local:default)
prologic@Jamess-iMac
Mon Aug 02 11:37:55
~/tmp/html2md
 (master) 130
$ ./html2md -i
Hello<br />World
Hello

World

HTML Input

Hello<br />World

Generated Markdown

Hello

World

Expected Markdown

Hello
World

Additional context

Is there any way to control this behaviour? I get that this might be getting interpreted as a "paragraph", but I would only expect that if there are two <br />(s) or an actual paragraph <p>...</p>. Thanks!

prologic avatar Aug 02 '21 01:08 prologic

This is expected behavior. A line break in Markdown requires two newline characters. A single newline character will not render as a line break, instead it will render as a space.

wcalandro avatar Jan 13 '22 23:01 wcalandro

According to this page (https://www.markdownguide.org/basic-syntax) a newline in markdown shall be formatted as follows: To create a line break or new line (<br>), end a line with two or more spaces, and then type return.

I have also seen implementations where <br> and <p></p> are converted to one and two newlines (as prologic recommends).

I don't know if there is a real standard for this. However, <br> must be treaded differently than <p></p> for not to loose information when converting from html to md.

akuehnis avatar Aug 31 '22 18:08 akuehnis

Take this HTML as the input:

<p>Line 1<br />Line 2</p>

With html-to-markdown and the normal commonmark behaviour for "br" with two newlines we get:

Line 1

Line 2

With Commonmark (see playground) this renders as:

<p>Line 1</p>
<p>Line 2</p>
two_newlines

If you add a custom rule for "br" that just returns a single newline with:

return String("\n")

You get this ouput:

Line 1
Line 2

With Commonmark (see playground) this renders as:

<p>Line 1
Line 2</p>
one_newline

If we compare the different implementations (see babelmark) this behaviour is mostly shared between implementations.

babelmark

The markdown rendering on github.com works differently however 🤷‍♂️

github_dot_com

If we want to be extra precise, the html-to-markdown library would need to also support hard line breaks. However that would require some other changes.

So for now, the current behaviour is going to stay as it is. Changing it would break it for other implementations. However you are free to change the behaviour, by writing a very simple custom rule.

JohannesKaufmann avatar May 08 '23 20:05 JohannesKaufmann

The markdown rendering on github.com works differently however

Then can we have the GitHub-flavored markdown to use single line breaks please?
(without the need of hard line breaks, as the GitHub-flavored markdown is supposed to be tailored towards github.com)

And the change would be minimum I'd presume. IE changing from output \n\n, to do the following instead:

output "\n"
if (not in the GitHub-flavored markdown mode) output "\n"

Thanks

suntong avatar May 10 '23 23:05 suntong

Then can we have the GitHub-flavored markdown to use single line breaks please?

There are other renderers — like the GitHub Flavored Markdown Extension from goldmark — that also implement the spec. And I don't want to break those.

Right now, it seems like its only github.com that causes the problem...

JohannesKaufmann avatar May 17 '23 18:05 JohannesKaufmann

Then can we have the GitHub-flavored markdown to use single line breaks please?

There are other renderers — like the GitHub Flavored Markdown Extension from goldmark — that also implement the spec. And I don't want to break those.

Right now, it seems like its only github.com that causes the problem...

What about an additional built-in rule for these linebreaks? @suntong seems to be against the idea of altering the behavior of using this project GFM's plugin or adding a new parameter to accomplish this.

God-damnit-all avatar May 21 '23 01:05 God-damnit-all

@suntong I'm doubting you want a PR of this but: https://github.com/ImportTaste/html2md/commit/082a6fb51863893a955aa3d59bf241224c48fe0b

Works well for me. I really don't think @JohannesKaufmann is going to budge.

God-damnit-all avatar Jun 15 '23 04:06 God-damnit-all

NP, I'd love to, since it works well for you, and also because I'd agree with you that such feature might never be accepted here. So, send the PR pls.

suntong avatar Jun 15 '23 14:06 suntong