gdbgui
gdbgui copied to clipboard
unicode error when view source file which is not utf-8 encoded
I encountered this issue when I attempt to open cp949 encoded file. I think same issue would occur with shift-jis encoded file and these can be easily reproduced.
'utf-8' codec can't decode byte 0xc0 in position ~
The problem is in backend.py:read_file
method. In here, it attempts to read source file with default open()
method, which reads file as unicode by default. Exception occurs when file is not in unicode. I think codecs.open
with correct encoding option is necessary. correct encoding shall be passed with gdbgui parameter or use session environment variable.
Currently I modified backend.py:689
line to set encoding from my environment variable and no problem.
...
sys_enc = os.getenv('LC_ALL', 'utf-8')
with codecs.open(path, "r", encoding=sys_enc) as f:
...
Environments are, Ubuntu 14.04 gdbgui 0.13.2.0 (downloaded from pip) gdb 8.2 firefox 66.0.3
Thanks.
ps. Fixed error message as previous one was incorrect.
Hi kuna, I'm facing the same problem now. I tried to follow your solution but I couldn't find the backend.py. At your convenience, could you give me some advice? Thanks!
Looks like you search for https://github.com/cs01/gdbgui/blob/531d89890c0b4bd3bbf15d266b9ec25a2c7eebaa/gdbgui/server/http_routes.py#L55
Just out of interest: what is the output of locale
?
According to https://docs.python.org/3.9/library/functions.html#open python uses the preferred user encoding https://docs.python.org/3.9/library/locale.html#locale.getpreferredencoding so setting up LANG
and friends should help - doesn't it?
Looks like #347 is related to this.
Hi @GitMensch, I got interested in your suggestion and tested a little:
LANG=ko_KR.cp949 python test.py test_cp949.txt
ko_KR.cp949
Traceback (most recent call last):
File "/Users/dongwon/dev/test.py", line 8, in <module>
for l in f.readlines():
File "/Users/dongwon/.pyenv/versions/3.10.2/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 18: invalid start byte
LANG=cp949
also didn't work. So, LANG
seems like not effective in this case.
And from 'locale' documentation it seems it always set encoding to UTF-8, so that wouldn't help.
Well, maybe methods in this link(Korean document) seems works, but anyway code should be changed in this way.