mdformat Add `--end-of-line=native`

Context

I have a repo that uses mdformat as a pre-commit hook; it works beautifully on Linux and macOS.

If I clone the repo on Windows, under Git's default settings (core.autocrlf = true), the Markdown files in the working tree have CRLF newlines. When the pre-commit hook runs, it replaces newlines with LF (as expected in the absence of --end-of-line). This leads to the confusing situation where:

pre-commit reports no errors, yet
after the commit, there remain unstaged changes (consisting of CRLF -> LF replacement).

Specifying --end-of-line=crlf can fix this, but we would need to specify this on Windows only, and there is no way to do so in .pre-commit-config.yaml or .mdformat.toml, as far as I can tell.

It is probably also possible to work around this using .gitattributes, but that adds more things to maintain.

Proposal

Add the choice native, in addition to the current choices (lf, crlf), to the --end-of-line option.
- And similarly to the end_of_line key of .mdformat.toml.
When --end-of-line=native is given, use the value of os.linesep to determine the newlines to use.
The default newline remains lf to maintain compatibility.

The choice of the name native is bikesheddable (perhaps auto?).

Tasks and updates

[ ] Settle on a name (native, auto, or other)
[ ] Implement

Jul 16 '22 14:07 marktsuchida

Thanks for opening your first issue here! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out EBP's Code of Conduct. Also, please try to follow the issue template as it helps other community members to contribute more effectively.
If your issue is a feature request, others may react to it, to raise its prominence (see Feature Voting).
Welcome to the EBP community! :tada:

Jul 16 '22 14:07 welcome[bot]

One reason to prefer native over auto is that some people may think that auto means keep the existing newline style of the file, which is not what I'm proposing here (though also a possible feature).

Jul 16 '22 15:07 marktsuchida

Thanks for the issue!

My understanding is that core.autocrlf is set to false by default (see https://stackoverflow.com/a/48230871). Would it solve your issue if you change your git configuration to that? My understanding is that most modern text editors and IDEs on Windows work fine with bare LF newlines, so autocrlf is probably not needed.

Jul 16 '22 20:07 hukkin

Thanks for the quick reply.

The Windows installer for Git, by default, sets the effective default to core.autocrlf = true (see the second answer to the same SO question; I also checked this myself today), so most people will have that setting by default on Windows. And personally I think it is a wise default.

Most code editors on Windows work with (and preserve) LF line endings, but typically default to CRLF for newly created files, so autocrlf is actually quite important for keeping CRLF newlines out of the repository. And programs that are not smart enough to preserve newline style will typically emit CRLF on Windows (where they would emit LF elsewhere), making mdformat a bit of an outlier, so changing Git behavior to suit mdformat can end up making things more complicated for everything else.

I just checked how Black behaves, and it looks like it detects and preserves the original newline type of the file when reformatting, which makes sense, being the least likely to cause problems under any environment. When the file contains mixed LF and CRLF newlines (it happens in the wild), the first style encountered appears to be used. I found this to be the case on both Windows and macOS.

So an --end-of-line=keep option might also (or alternatively) be worth considering, though its implementation may be slightly less simple than --end-of-line=native. As far as I'm concerned, either would solve my problem.

BTW, I'm happy to work on a PR for this if the idea is accepted.

Jul 17 '22 00:07 marktsuchida

I also just stumbled accross this issue and second it.

Using Git for windows (the default and recomend way to get git onto windows), the default behavior is core.autocrlf = true (you could change it during installation, but thats beside the point..).

I also helped implementing this exact feature into yamllint for exacly this reason, see https://github.com/adrienverge/yamllint/pull/474 (there we settled on platform as the name btw.)

I could also see this beeing a plugin instead of a core functionalaty, but as far as I understand it, this would currently only be doable by having a second phrase run after the original one and change back the fileendings? or am I missing something?

Aug 22 '22 11:08 Cube707

Also some personal Options: (please don't take them as aggressive, your project is great 😄)

The platfrom defautl if crlf on windows, as stupid as that may be. So if users want to enforce a non-platform standart (lf), it should require an explicit config to do that, instead of beeing the default behavior. This is however probably not doable any more, as it would be a breaking change...

Relying on tooling to handle the promblem is also not a best practice in my opinion. While it will mostly work fine, there is always the posibily of some wird edge case and more importantly, this makes it less beginner friendly. Its really bad that an unexperienced user can download this project onto his windows machine and get this file change in ways he might not jet understand and possibly break some parts of this workflow by commiting different fileendings on different files.

Python by default makes it super easy to (not-)handle fileendings, so it feels like a step back to have to discuss this.

Aug 22 '22 12:08 Cube707

Sorry for the inactivity.

I'd prefer --end-of-line=keep over --end-of-line=native as I prefer not having (or testing) platform dependent side-effects.

If either of you is willing to make a PR for --end-of-line=keep, I'll be happy to review. If not, I can work on this myself (but can't promise when).

Let's decide which newline sequence to use based on the first one in the file, which is how @marktsuchida mentioned that Black also does it. In the special case that there are no newlines, default to a bare LF.

Regarding the critique by @Cube707: I understand the frustration, and I agree that the current default is probably not the best for the more casual users. Mdformat is, however, geared towards professional, collaborative workflows where there is a need for formatting that is byte-to-byte equal, independent of the person or machine running the formatter.

Aug 22 '22 12:08 hukkin

@hukkin

I experimented around a bit and also looked at the black implementation.

One issue came up: Do we care about \r as a line seperator? I belive Mac used that for a while but has long also switched to \n so I would assume we can ignore it, as the Markdown phrase would brake probably anyway?

Aug 22 '22 14:08 Cube707

I agree that --end-of-line=keep is the better solution.

I don't think we need to worry about the CR-only newlines of classic Mac OS. Mac OS X came out 20+ years ago and has used LF since. Markdown and Git were invented after that, so Markdown files with CR are unlikely. Python still supports CR, so if we only explicitly detect CRLF, we should end up converting CRs to LFs (which I think is what Black does), which seems reasonable behavior (and I would propose to leave it undocumented).

I'm happy to work on a PR for --end-of-line=keep, but may need a few weeks before I get to it. @Cube707 if you are interested, please don't hesitate to go ahead.

Aug 22 '22 16:08 marktsuchida

We cannot ignore lone CRs. They are just as valid of a line ending sequence as are LF and CRLF in CommonMark. But I also don't think we need to support unifying line endings to lone CR.

There's three valid line endings: CRLF, lone LF, lone CR. I propose the following spec for --end-of-line=keep: Unify all instances of the three distinct line ending sequences to only one of them. If the first line ending sequence is a CRLF, convert all line endings to CRLF. Else convert all line ending sequences to LF (this includes the special case of no line endings at all).

Aug 23 '22 21:08 hukkin

100% agree with that spec.

Aug 23 '22 22:08 marktsuchida

This should now be resolved with --end-of-line=keep.

Thanks @marktsuchida for the issue and @Cube707 for the PR!

Aug 24 '22 17:08 hukkin

Thank you @Cube707 and @hukkin! Works beautifully.

Aug 24 '22 20:08 marktsuchida

mdformat mdformat copied to clipboard

Add `--end-of-line=native`

Context

Proposal

Tasks and updates

mdformat
mdformat copied to clipboard