Verify
Verify copied to clipboard
Add options for BOM and EOF newline
When Verify generates its "received" files, it always creates a BOM at the top of the file, and it never adds a newline at the end. These both show with red glyphs on GitHub.
One can remove the BOM and add a newline manually in the "verified" file, and those are ignored by Verify when comparing, but diff tools will still show them if there are other changes in the file. Also it's a pain if you just want to accept the whole file.
Since people have different preferences in this regard, I propose addings some settings to control them when Verify generates the received file.
I'd also suggest the defaults be UTF8 without BOM, and always a newline at the end of the file. But that's just my preference.
There could also be an option for line endings, but generally I think this is handled fine by using Environment.Newline
and/or by git itself.
the BOM is because, when it is omitted, many diff tools incorrectly guess the encoding. if we add an option to omit it, other people will hit the encoding issue
as for the trailing newline, i dont quite understand what the friction is?
There could also be an option for line endings, but generally I think this is handled fine by using Environment.Newline and/or by git itself.
I'm second to some option "ignore line endings". We hit an issue - an original and committed verified
and produced received
differ only by line endings. And the check failure is unwanted in this case.
@nightroman
I'm second to some option "ignore line endings". We hit an issue - an original and committed verified and produced received differ only by line endings. And the check failure is unwanted in this case.
this is a bug. verify should ignore line ending differences. can u push up a repro to a GH repo so i can take a look
note that the choice of lf as a line ending was done for perf. it means that on reading a verified file, it can be read as a single string instead of split into lines and re-joined.
i have found that setting the git line endings mitigates any line ending mis match issues.
i have found that setting the git line endings mitigates any line ending mis match issues.
As far as this is documented, it is rather not a bug. We ourselves ended up setting this in .gitattributes.
So after all I'm not sure about my request about adding "ignore line endings" options.
But without hurting performance too much, only when files are different, Verify may check for line endings and fail with an error about not supported line endings if found.
without hurting performance too much, only when files are different, Verify may check for line endings and fail with an error about not supported line endings if found
happy to consider a pr for this
the BOM is because, when it is omitted, many diff tools incorrectly guess the encoding. if we add an option to omit it, other people will hit the encoding issue
Are they really old diff tools? Anything released in the last decade or so should be guessing UTF-8 by default, given that this is by far the most common encoding in use over that time, and becoming more so.
I'd also suggest the defaults be UTF8 without BOM, and always a newline at the end of the file. But that's just my preference.
Those are also my personal preferences for text files.
as for the trailing newline, i dont quite understand what the friction is?
- I use EditorConfig, with
insert_final_newline = true
, so whenever I save the verified file, it inserts a newline, which then doesn't match the received file in future runs. While I could configure EditorConfig to ignore these files, it should not be necessary. - On *nix, a file must end with a newline.
Suggested settings and defaults:
v19 (non-breaking):
- InsertBom = true
- InsertEofNewline = false
- NewLine = Environment.NewLine
v20 (breaking):
- InsertBom = false
- InsertEofNewline = Environment.OSVersion.Platform == PlatformID.Unix
- NewLine = Environment.NewLine
@SimonCropp What do you think?
Would a PR be accepted?
how about if i respect the settings defined in editorconfig ?
@duncand
Are they really old diff tools? Anything released in the last decade or so should be guessing UTF-8 by default, given that this is by far the most common encoding in use over that time, and becoming more so.
for starters beyond compare still has problems with encoding detection https://forum.scootersoftware.com/forum/beyond-compare-4-discussion/general/10560-set-default-format-to-utf8
and note that BC is probably, from my iterations with devs, the most common compare tool used with Verify.
i looked into respecting editorconfig and it is more complex than i thought.
- editorconfig can be hierarchical so means parsing and merging multiple files
- should we respect the editorconfig for where the verified file will exist or where the test file exist
So i could add settings for BOM and line endings, but i would really prefer to add more settings
would modifying your editorconfig and gitattributed be a viable alternative? https://github.com/VerifyTests/Verify#text-file-settings
EditorConfig support would be nice but is by no means essential. For me, just having settings to configure these two behaviors would be perfectly adequate. I really don't like having files with BOMs and no EOF new-line in the tree; it's problematic with the defaults for basically all modern tooling - that isn't Beyond Compare, apparently. (I use Verify.Tool, personally.)
I feel like the discussion about changing the default for these behaviors ought to be completely separate from introducing a setting to control them. Changing the default would seem to be a lot more controversial with guaranteed breakage.
There is an EditorConfig library that should handle the lookup of settings, but I'm not sure how it might affect performance. Since you are writing the verified file, I would expect it to use the settings for that file, although I'm not sure if the file has to exist (that would be strange).
A few settings might be the easier/safer option.
would modifying your editorconfig and gitattributed be a viable alternative?
It's messy, and Rider has issues with respecting EditorConfig.
OK. I will accept 3 PRs
- opt out of Bom
- opt in to Eof Newlin
- custom newline string
Was this implemented somewhere? I wanted to look into it in the future, I just haven't had the time in recent months.
@glen-84 TBH i still dont understand the friction.
using insert_final_newline = true in editor config
So have a custom rule for .verified.
whenever I save the verified file
I dont understand how this is a common path. you should never (or at least very rarely) need to "save" a verified file. there is tooling to accept a received file. and if u are using a diff tool to accept a change, it should respect the exact contents of the received file.
On *nix, a file must end with a newline.
if this is the case, how are the tests passing on linux?
@mattjohnsonpint
When Verify generates its "received" files, it always creates a BOM at the top of the file, and it never adds a newline at the end. These both show with red glyphs on GitHub.
can u share an image of this
So have a custom rule for .verified.
Ya, I mentioned:
While I could configure EditorConfig to ignore these files, it should not be necessary.
I dont understand how this is a common path. you should never (or at least very rarely) need to "save" a verified file. there is tooling to accept a received file. and if u are using a diff tool to accept a change, it should respect the exact contents of the received file.
The verified file is committed, so you merge changes into it when there's a diff. Any tool (in this case VS Code) that supports EditorConfig will insert an EOF newline when you save the updated file (as it should).
We just ignore the EOF newline difference for now, since the files still match if this is the only difference, but it's less than ideal (can lead to developer confusion).
if this is the case, how are the tests passing on linux?
It won't necessarily break your tests, it's just how files are defined in the POSIX standard.
See https://stackoverflow.com/a/729795/221528.
I'm not sure why there's resistance to following standard practices on a given platform.