setup-cfg-fmt Setup-cfg-fmt doesn't respect newline convention of source file nor read/write it with the correct encoding

Setup-cfg-fmt does not respect the EOL of the setup.cfg, and instead converts it to a platform-dependent default. I'm not aware of a case where that would be the desirable behavior, whereas many/most projects standardize on one newline convention (typically \n) for all source files, which this breaks. Furthermore, if using an EoL fixer with pre-commit, it will catch this but aside from extra noise in the output, if it isn't sequenced after this hook in the config file, it will cause the next run to fail too.

Fixing this is very simple—just check the current EoL character when reading in the file, and pass that explicitly on writing it. So at the top of format_file(), on the line after read()ing the file, you'd just add newline = f.newlines, and then pass newline=newline in the call to open() when writing it,

In addition, neither open() call specifies the correct encoding to use, which means the platform and environment-dependent locale encoding will be assumed. As setup.cfg is required to be UTF-8, this means loading will fail on Windows if any non-ASCII characters are present (common in e.g. authors' names), or on any platform that doesn't have UTF-8 mode enabled, and even if loading were to succeed, the file would be written in the incorrect encoding. Furthermore, not specifying an encoding when opening in text mode will be an optional warning in Python 3.10, with a DeprecationWarning planned to follow in later versions. To fix this, simply explicitly pass encoding="utf-8" to both calls.

Happy to submit a PR to fix both of these if you'd like. Thanks!

Aug 13 '21 04:08 CAM-Gerlach

adding encoding='utf-8' seems fine, as for picking a line ending I'd lean towards always using \n -- I don't think your suggestion of newline = f.newlines works, that value is always None and determining mixed newlines or the proper ending seems out of scope and/or difficult (and/or better handled by a tool that actually is intended to solve that problem)

Aug 15 '21 23:08 asottile

I don't think your suggestion of newline = f.newlines works, that value is always None

That attribute is lazy and you have to do a .read() first, or it will be None as you observed, which is why my suggestion added it it after the existing read().

determining mixed newlines or the proper ending seems out of scope and/or difficult (and/or better handled by a tool that actually is intended to solve that problem)

Good point on the mixed newlines; since that's a relatively uncommon edge case I hadn't considered it. However, in that case (where per the docs, f.newlines will return a tuple of line endings, so if f.newlines is not a str instance, it can simply fall back to \n as you suggest.

Other than cases of mixed EOLs where the file must be conformed to something, I'd think arbitrarily changing line ending, generally a per-project default handled by other hooks and orthogonal to the standards and conventions used in setup.cfg, would be out of scope here.

as for picking a line ending I'd lean towards always using \n

While hardcoding it would work for my particular application, it won't solve this problem for any projects that use another line ending, and potentially create a new one for those that use the platform default. So I'd suggest respecting the existing line ending for the common/simple case, and only conforming if its non-trivial.

Aug 16 '21 23:08 CAM-Gerlach

setup-cfg-fmt setup-cfg-fmt copied to clipboard

Setup-cfg-fmt doesn't respect newline convention of source file nor read/write it with the correct encoding

setup-cfg-fmt
setup-cfg-fmt copied to clipboard