blackfriday icon indicating copy to clipboard operation
blackfriday copied to clipboard

Markdown file with CRLF line endings will cause wrong-style output

Open jinliming2 opened this issue 7 years ago • 7 comments

Hi, When I started to use blackfriday to parse my Markdown file, I got a wrong-style output. My Markdown file looks like this, with Windows-style CRLF (\r\n) line endings:

# Hello World

This is my content.

And I wrote the code like this in my project:

func main() {
	file, err := ioutil.ReadFile("./test.md")
	if err != nil {
		println(err)
	}
	println(string(file))
	println("---------------------------------------------")
	out := blackfriday.Run(file)
	println(string(out))
}

And now, when I ran my program, I got output in my console like this:

# Hello World

This is my content.

---------------------------------------------
</h1>ello World

<p>
</p> is my content.

Is this a problem in blackfriday? Thanks.

jinliming2 avatar Jan 04 '18 16:01 jinliming2

I can confirm that text with Windows line endings (CLRF) is not handled correctly. My current workaround is to replace them before passing it to blackfriday:

markdownWithUnixLineEndings := strings.Replace(markdown, "\r\n", "\n", -1)
blackfriday.Run([]byte(markdownWithUnixLineEndings))

klingtnet avatar Jan 14 '18 15:01 klingtnet

Confirmed here as well. I have a WIP fix, but I'm not spending much time on Blackfriday lately, so haven't cleaned it up yet.

rtfb avatar Jan 18 '18 19:01 rtfb

Submitted #428, feel free to review and comment.

rtfb avatar Jan 18 '18 20:01 rtfb

For those that need a solution and can't wait for the PR to come up, this is what we did at Gitea to make it work in the meantime: go-gitea/gitea#8925

Basically, we convert every \r or \r\n sequence to \n. We did some tests and we arrived to that algorithm as the fastest (save modifying the string in-situ, which is even faster).

guillep2k avatar Nov 12 '19 19:11 guillep2k

Here's the actual code:

https://github.com/go-gitea/gitea/blob/dc8036dcc680abab52b342d18181a5ee42f40318/modules/util/util.go#L68-L102

It just rips out all \r\n and \r replacing them with \n - so if for some perverse reason you actually intend there to be a raw \r in your markdown page it will become a newline - however it is fast.

zeripath avatar Nov 12 '19 19:11 zeripath

If you would prefer it replacing in place - then remove the definition of tmp and replace all references to tmp with input or vice versa. That would do it.

zeripath avatar Nov 12 '19 19:11 zeripath

I just do a bytes.Replace on the input, but this had me confused as for an hour or so.

TACIXAT avatar May 17 '20 22:05 TACIXAT