tbump icon indicating copy to clipboard operation
tbump copied to clipboard

UnicodeDecodeError: 'gbk' codec can't decode:

Open ffreemt opened this issue 4 years ago • 4 comments

...
   change_request
  File "c:\path-to-project-folder\.venv\lib\site-packages\tbump\file_bumper.py", line 219, in compute_patches_for_change_request
    old_lines = file_path.read_text().splitlines(keepends=False)
File "C:\Python\Python37\lib\pathlib.py", line 1222, in read_text
    return f.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 118: illegal multibyte sequence

A simple patch in file_bumper.py", line 219 fixes the problem: file_path.read_text() -> file_path.read_text("utf8")

It would be nice if the next version adds this "utf8".

ffreemt avatar Mar 27 '21 04:03 ffreemt

Hum. I need to think about this one.

I think tbump uses .encode() and .decode() without special arguments everywhere. In theory, tbump should use the default encoding of the platform it is running on and work out of the box everywhere, regardless of how your the source files are encoded.

That being said, maybe I'm wrong. In that case, we should use encode() and decode() with the utf-8 encoding explicitly set everywhere and document that tbump only works for UTF-8 encodings.

At any rate, I'm pretty sure that just patching this one line is not enough.

What do you think ?

dmerejkowsky avatar Mar 30 '21 16:03 dmerejkowsky

Hi,

Sometimes a file do not follow the system encoding, in that case the only way is to specify the encoding somehow. Also calling encode/decode on such a file will probably fail.

On Tue, Mar 30, 2021, 19:11 Dimitri Merejkowsky @.***> wrote:

Hum. I need to think about this one.

I think tbump uses .encode() and .decode() without special arguments everywhere. In theory, tbump should use the default encoding of the platform it is running on and work out of the box everywhere, regardless of how your the source files are encoded.

That being said, maybe I'm wrong. In that case, we should use encode() and decode() with the utf-8 encoding explicitly set everywhere and document that tbump only works for UTF-8 encodings.

At any rate, I'm pretty sure that just patching this one line is not enough.

What do you think ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dmerejkowsky/tbump/issues/89#issuecomment-810391086, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAL75NGN4VEJDNQUUDPTE3TGHZ3PANCNFSM4Z4QBWRA .

cgestes avatar Mar 30 '21 17:03 cgestes

Hum. I need to think about this one.

I think tbump uses .encode() and .decode() without special arguments everywhere. In theory, tbump should use the default encoding of the platform it is running on and work out of the box everywhere, regardless of how your the source files are encoded.

That being said, maybe I'm wrong. In that case, we should use encode() and decode() with the utf-8 encoding explicitly set everywhere and document that tbump only works for UTF-8 encodings.

At any rate, I'm pretty sure that just patching this one line is not enough.

What do you think ?

Not too sure, but I patched that line and everything is fine it seems.

yucongo avatar May 21 '21 01:05 yucongo

Not too sure, but I patched that line and everything is fine it seems.

Yeah but you should not have to!

It would be good to reproduce and figure out the root cause but I don't have access to a Windows machine that uses the gbdk encoding ...

I don't want to merge a patch that hard-codes utf-8 without understanding all the implications.

Let's ask for help.

dmerejkowsky avatar Sep 11 '21 17:09 dmerejkowsky