tbump
tbump copied to clipboard
UnicodeDecodeError: 'gbk' codec can't decode:
...
change_request
File "c:\path-to-project-folder\.venv\lib\site-packages\tbump\file_bumper.py", line 219, in compute_patches_for_change_request
old_lines = file_path.read_text().splitlines(keepends=False)
File "C:\Python\Python37\lib\pathlib.py", line 1222, in read_text
return f.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 118: illegal multibyte sequence
A simple patch in file_bumper.py", line 219 fixes the problem:
file_path.read_text() -> file_path.read_text("utf8")
It would be nice if the next version adds this "utf8".
Hum. I need to think about this one.
I think tbump uses .encode() and .decode() without special arguments everywhere. In theory, tbump should use the default encoding of the platform it is running on and work out of the box everywhere, regardless of how your the source files are encoded.
That being said, maybe I'm wrong. In that case, we should use encode() and decode() with the utf-8 encoding explicitly set everywhere and document that tbump only works for UTF-8 encodings.
At any rate, I'm pretty sure that just patching this one line is not enough.
What do you think ?
Hi,
Sometimes a file do not follow the system encoding, in that case the only way is to specify the encoding somehow. Also calling encode/decode on such a file will probably fail.
On Tue, Mar 30, 2021, 19:11 Dimitri Merejkowsky @.***> wrote:
Hum. I need to think about this one.
I think tbump uses .encode() and .decode() without special arguments everywhere. In theory, tbump should use the default encoding of the platform it is running on and work out of the box everywhere, regardless of how your the source files are encoded.
That being said, maybe I'm wrong. In that case, we should use encode() and decode() with the utf-8 encoding explicitly set everywhere and document that tbump only works for UTF-8 encodings.
At any rate, I'm pretty sure that just patching this one line is not enough.
What do you think ?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dmerejkowsky/tbump/issues/89#issuecomment-810391086, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAL75NGN4VEJDNQUUDPTE3TGHZ3PANCNFSM4Z4QBWRA .
Hum. I need to think about this one.
I think
tbumpuses.encode()and.decode()without special arguments everywhere. In theory,tbumpshould use the default encoding of the platform it is running on and work out of the box everywhere, regardless of how your the source files are encoded.That being said, maybe I'm wrong. In that case, we should use
encode()anddecode()with theutf-8encoding explicitly set everywhere and document thattbumponly works for UTF-8 encodings.At any rate, I'm pretty sure that just patching this one line is not enough.
What do you think ?
Not too sure, but I patched that line and everything is fine it seems.
Not too sure, but I patched that line and everything is fine it seems.
Yeah but you should not have to!
It would be good to reproduce and figure out the root cause but I don't have access to a Windows machine that uses the gbdk encoding ...
I don't want to merge a patch that hard-codes utf-8 without understanding all the implications.
Let's ask for help.