UTF-8 implicity doing bad stuff on Windows
#201 was caused by two different libraries making assumptions about how to save text files.
This appears to have been developed on Macs and Linux, upon which I think Python might be assuming everything gets saved as utf-8 by default. But on Windows, I think Python may be defaulting to its own character set, which IIRC is cp-1252. Character encoding is not explicit on any of these text file operations.
So on Windows many text operations are going to fail because UTF-8 characters are getting pulled from web pages and Excel files.
Per Pep 20, explicit is better than implicit.
It may not affect anything to permanently add encoding="utf-8" to these file operations, but I'd want to test that better first.
It's a decent bet similar flaws exist in warn-transformer, as well.
in cache
cache.write includes this:
with open(out, "w", newline="") as fh:
cache.read includes this:
with open(path, newline="") as infile:
cache.reaad_csv includes this:
with open(path) as fh:
and in utils ...
utils.write_dict_rows_to_csv includes this:
with open(output_path, mode, newline="") as f:
utils.write_dict_rows_to_csv includes this:
with open(output_path, mode, newline="") as f:
To clarify: "It may not affect anything to permanently add encoding="utf-8" to these file operations, but I'd want to test that better first." -- I'd meant on Linux and Macs. Hopefully that would prevent other problems within Windows.
Opening a bug in warn-transformer along similar lines.
Looks like the encoding default is indeed UTF-8 on Linux and Mac, but cp1252 on Windows. Fixing the libraries would be easy.
https://peps.python.org/pep-0686/