pinnwand icon indicating copy to clipboard operation
pinnwand copied to clipboard

Line endings are forcibly converted to CRLF

Open ghost opened this issue 3 years ago • 10 comments

Steps to reproduce

  1. Paste something on https://bpa.st/ via the website
  2. Get the raw link
  3. View it with something that shows line endings, e.g.: curl -s https://bpa.st/raw/M5HQ | cat -v

Expected behavior

Line endings should ideally LF or untouched. CRLF is unsuitable for Unix systems in most cases, especially for source code.

Actual behavior

Pastes get CRLF line endings even when they weren't explicitly pasted as such:

❯ curl -s 'https://bpa.st/raw/M5HQ' | cat -v
foo^M
bar%

ghost avatar Dec 03 '21 09:12 ghost

I also noted this problem and found it quite annoying. It means for example that you can't execute a downloaded bash script without prior conversion.

But I don't see an obvious solution to the problem. There are different line endings and picking the right one must be done during download, because the uploader could use a different OS than the downloader.

What I propose is this:

  • offer 3 kinds of raw download links with the 3 different line endings unix, windows and mac os
  • when showing the raw download link in the browser, show a dropdown offering unix, windows and mac os line endings
  • when you change the value in the dropdown, the raw link is changed with javascript
  • the os given in the browser user-agent string is used to select the default value of the dropdown

gvegidy avatar Dec 28 '21 17:12 gvegidy

If the language is properly selected it should be able to automatically choose the correct line ending (e.g. LF for bash scripts, CRLF for batch).

0xallie avatar Dec 28 '21 17:12 0xallie

That might work for bash scripts or batch files as they are only designed for one platform.

But what about text files or python scripts? They should have the proper line endings on all platforms.

gvegidy avatar Dec 28 '21 19:12 gvegidy

Text files are debatable, but Python scripts should always be LF, as (at least on Linux) they will refuse to run if they use CRLF line endings.

The only real reason to use CRLF for code (except things like batch that require it) is compatibility with Notepad, which I wouldn't consider that important as nobody should really be using Notepad to edit code.

0xallie avatar Dec 28 '21 19:12 0xallie

but Python scripts should always be LF, as (at least on Linux) they will refuse to run if they use CRLF line endings.

Yes, that is exactly the issue. But Python is used quite commonly on Windows and it should have CRLF there.

gvegidy avatar Dec 28 '21 19:12 gvegidy

Again, there is no reason (other than Notepad compatibility) for Python scripts to have CRLF on Windows even if it works. They will run fine with LF too.

0xallie avatar Dec 28 '21 19:12 0xallie

Huh, interesting issue. As far as I know I'm not explicitly converting line endings and the raw files should be left just as the name implies, raw see here for where it gets nabbed from the HTTP request:

https://github.com/supakeen/pinnwand/blob/master/pinnwand/handler/website.py#L121

and here for where it goes into the database:

https://github.com/supakeen/pinnwand/blob/master/pinnwand/database.py#L135

If anything something would be implicitly converting it along the way (perhaps at the rendering stage, perhaps at the input stage). Does either of you have an idea where that would be happening before I take a closer look?

As far as the discussion about CRLF vs LF on its own, that's a hard decision to make. I'd be against making it lexer specific both for the fact that one can't say one language is used on one platform only and the fact that it's a big list to keep which would need to be kept against the pygments upstream support of lexers :(

Perhaps a separate download or raw view, is it common to use text editors on Windows that can't deal with a missing carriage return still?

supakeen avatar Dec 28 '21 19:12 supakeen

HTTP generally uses CRLF, so maybe it's that? I haven't looked too deep into it though.

0xallie avatar Dec 28 '21 19:12 0xallie

is it common to use text editors on Windows that can't deal with a missing carriage return still?

The "notepad" delivered with Windows 10 recently learned to deal with LF. But notepad that comes with older Windows versions doesn't.

But it is not just editors, you might want to run a batch file or powershell script downloaded from pinnwand without having to run a converter first. And they need CRLF on Windows to properly work.

gvegidy avatar Dec 28 '21 19:12 gvegidy

HTTP generally uses CRLF, so maybe it's that? I haven't looked too deep into it though.

Yes, this looks likely to me. I just recorded the post request in the browser (firefox on linux) and this was in the POST-data: _xsrf=2%7Cfb72ec3c%7C75219b05f93d9119e20dd080d9436564%7C1640713522&lexer=text&filename=&raw=this+is+a+test%0D%0Anext+line%0D%0Anext+line%0D%0A%0D%0A&expiry=1hour

gvegidy avatar Dec 28 '21 20:12 gvegidy