Verify icon indicating copy to clipboard operation
Verify copied to clipboard

Add options for BOM and EOF newline

Open mattjohnsonpint opened this issue 2 years ago • 4 comments

When Verify generates its "received" files, it always creates a BOM at the top of the file, and it never adds a newline at the end. These both show with red glyphs on GitHub.

One can remove the BOM and add a newline manually in the "verified" file, and those are ignored by Verify when comparing, but diff tools will still show them if there are other changes in the file. Also it's a pain if you just want to accept the whole file.

Since people have different preferences in this regard, I propose addings some settings to control them when Verify generates the received file.

I'd also suggest the defaults be UTF8 without BOM, and always a newline at the end of the file. But that's just my preference.

There could also be an option for line endings, but generally I think this is handled fine by using Environment.Newline and/or by git itself.

mattjohnsonpint avatar Sep 02 '22 19:09 mattjohnsonpint

the BOM is because, when it is omitted, many diff tools incorrectly guess the encoding. if we add an option to omit it, other people will hit the encoding issue

as for the trailing newline, i dont quite understand what the friction is?

SimonCropp avatar Sep 08 '22 09:09 SimonCropp

There could also be an option for line endings, but generally I think this is handled fine by using Environment.Newline and/or by git itself.

I'm second to some option "ignore line endings". We hit an issue - an original and committed verified and produced received differ only by line endings. And the check failure is unwanted in this case.

nightroman avatar Sep 09 '22 12:09 nightroman

@nightroman

I'm second to some option "ignore line endings". We hit an issue - an original and committed verified and produced received differ only by line endings. And the check failure is unwanted in this case.

this is a bug. verify should ignore line ending differences. can u push up a repro to a GH repo so i can take a look

note that the choice of lf as a line ending was done for perf. it means that on reading a verified file, it can be read as a single string instead of split into lines and re-joined.

i have found that setting the git line endings mitigates any line ending mis match issues.

SimonCropp avatar Sep 09 '22 23:09 SimonCropp

i have found that setting the git line endings mitigates any line ending mis match issues.

As far as this is documented, it is rather not a bug. We ourselves ended up setting this in .gitattributes.

So after all I'm not sure about my request about adding "ignore line endings" options.

But without hurting performance too much, only when files are different, Verify may check for line endings and fail with an error about not supported line endings if found.

nightroman avatar Sep 10 '22 03:09 nightroman

without hurting performance too much, only when files are different, Verify may check for line endings and fail with an error about not supported line endings if found

happy to consider a pr for this

SimonCropp avatar Jan 09 '23 07:01 SimonCropp

the BOM is because, when it is omitted, many diff tools incorrectly guess the encoding. if we add an option to omit it, other people will hit the encoding issue

Are they really old diff tools? Anything released in the last decade or so should be guessing UTF-8 by default, given that this is by far the most common encoding in use over that time, and becoming more so.

duncand avatar Feb 13 '23 05:02 duncand

I'd also suggest the defaults be UTF8 without BOM, and always a newline at the end of the file. But that's just my preference.

Those are also my personal preferences for text files.

duncand avatar Feb 13 '23 05:02 duncand

as for the trailing newline, i dont quite understand what the friction is?

  1. I use EditorConfig, with insert_final_newline = true, so whenever I save the verified file, it inserts a newline, which then doesn't match the received file in future runs. While I could configure EditorConfig to ignore these files, it should not be necessary.
  2. On *nix, a file must end with a newline.

Suggested settings and defaults:

v19 (non-breaking):

  • InsertBom = true
  • InsertEofNewline = false
  • NewLine = Environment.NewLine

v20 (breaking):

  • InsertBom = false
  • InsertEofNewline = Environment.OSVersion.Platform == PlatformID.Unix
  • NewLine = Environment.NewLine

glen-84 avatar Mar 08 '23 11:03 glen-84

@SimonCropp What do you think?

glen-84 avatar Mar 13 '23 13:03 glen-84

Would a PR be accepted?

glen-84 avatar May 08 '23 17:05 glen-84

how about if i respect the settings defined in editorconfig ?

SimonCropp avatar May 10 '23 03:05 SimonCropp

@duncand

Are they really old diff tools? Anything released in the last decade or so should be guessing UTF-8 by default, given that this is by far the most common encoding in use over that time, and becoming more so.

for starters beyond compare still has problems with encoding detection https://forum.scootersoftware.com/forum/beyond-compare-4-discussion/general/10560-set-default-format-to-utf8
and note that BC is probably, from my iterations with devs, the most common compare tool used with Verify.

i looked into respecting editorconfig and it is more complex than i thought.

  • editorconfig can be hierarchical so means parsing and merging multiple files
  • should we respect the editorconfig for where the verified file will exist or where the test file exist

So i could add settings for BOM and line endings, but i would really prefer to add more settings

would modifying your editorconfig and gitattributed be a viable alternative? https://github.com/VerifyTests/Verify#text-file-settings

SimonCropp avatar May 10 '23 13:05 SimonCropp

EditorConfig support would be nice but is by no means essential. For me, just having settings to configure these two behaviors would be perfectly adequate. I really don't like having files with BOMs and no EOF new-line in the tree; it's problematic with the defaults for basically all modern tooling - that isn't Beyond Compare, apparently. (I use Verify.Tool, personally.)

I feel like the discussion about changing the default for these behaviors ought to be completely separate from introducing a setting to control them. Changing the default would seem to be a lot more controversial with guaranteed breakage.

alexrp avatar May 10 '23 14:05 alexrp

There is an EditorConfig library that should handle the lookup of settings, but I'm not sure how it might affect performance. Since you are writing the verified file, I would expect it to use the settings for that file, although I'm not sure if the file has to exist (that would be strange).

A few settings might be the easier/safer option.

would modifying your editorconfig and gitattributed be a viable alternative?

It's messy, and Rider has issues with respecting EditorConfig.

glen-84 avatar May 10 '23 19:05 glen-84

OK. I will accept 3 PRs

  • opt out of Bom
  • opt in to Eof Newlin
  • custom newline string

SimonCropp avatar May 13 '23 10:05 SimonCropp

Was this implemented somewhere? I wanted to look into it in the future, I just haven't had the time in recent months.

glen-84 avatar Sep 07 '23 15:09 glen-84

@glen-84 TBH i still dont understand the friction.

using insert_final_newline = true in editor config

So have a custom rule for .verified.

whenever I save the verified file

I dont understand how this is a common path. you should never (or at least very rarely) need to "save" a verified file. there is tooling to accept a received file. and if u are using a diff tool to accept a change, it should respect the exact contents of the received file.

On *nix, a file must end with a newline.

if this is the case, how are the tests passing on linux?

SimonCropp avatar Sep 08 '23 00:09 SimonCropp

@mattjohnsonpint

When Verify generates its "received" files, it always creates a BOM at the top of the file, and it never adds a newline at the end. These both show with red glyphs on GitHub.

can u share an image of this

SimonCropp avatar Sep 08 '23 00:09 SimonCropp

So have a custom rule for .verified.

Ya, I mentioned:

While I could configure EditorConfig to ignore these files, it should not be necessary.


I dont understand how this is a common path. you should never (or at least very rarely) need to "save" a verified file. there is tooling to accept a received file. and if u are using a diff tool to accept a change, it should respect the exact contents of the received file.

The verified file is committed, so you merge changes into it when there's a diff. Any tool (in this case VS Code) that supports EditorConfig will insert an EOF newline when you save the updated file (as it should).

We just ignore the EOF newline difference for now, since the files still match if this is the only difference, but it's less than ideal (can lead to developer confusion).

if this is the case, how are the tests passing on linux?

It won't necessarily break your tests, it's just how files are defined in the POSIX standard.

See https://stackoverflow.com/a/729795/221528.

I'm not sure why there's resistance to following standard practices on a given platform.

glen-84 avatar Sep 10 '23 11:09 glen-84