gdbgui icon indicating copy to clipboard operation
gdbgui copied to clipboard

unicode error when view source file which is not utf-8 encoded

Open kuna opened this issue 5 years ago • 3 comments

I encountered this issue when I attempt to open cp949 encoded file. I think same issue would occur with shift-jis encoded file and these can be easily reproduced.

output 'utf-8' codec can't decode byte 0xc0 in position ~

The problem is in backend.py:read_file method. In here, it attempts to read source file with default open() method, which reads file as unicode by default. Exception occurs when file is not in unicode. I think codecs.open with correct encoding option is necessary. correct encoding shall be passed with gdbgui parameter or use session environment variable.

Currently I modified backend.py:689 line to set encoding from my environment variable and no problem.

...
sys_enc = os.getenv('LC_ALL', 'utf-8')                               
with codecs.open(path, "r", encoding=sys_enc) as f:
...

Environments are, Ubuntu 14.04 gdbgui 0.13.2.0 (downloaded from pip) gdb 8.2 firefox 66.0.3

Thanks.

ps. Fixed error message as previous one was incorrect.

kuna avatar Sep 10 '19 02:09 kuna

Hi kuna, I'm facing the same problem now. I tried to follow your solution but I couldn't find the backend.py. At your convenience, could you give me some advice? Thanks!

ruoruo220 avatar Oct 12 '21 04:10 ruoruo220

Looks like you search for https://github.com/cs01/gdbgui/blob/531d89890c0b4bd3bbf15d266b9ec25a2c7eebaa/gdbgui/server/http_routes.py#L55

Just out of interest: what is the output of locale? According to https://docs.python.org/3.9/library/functions.html#open python uses the preferred user encoding https://docs.python.org/3.9/library/locale.html#locale.getpreferredencoding so setting up LANG and friends should help - doesn't it?

Looks like #347 is related to this.

GitMensch avatar Dec 30 '21 21:12 GitMensch

Hi @GitMensch, I got interested in your suggestion and tested a little:

LANG=ko_KR.cp949 python test.py test_cp949.txt
ko_KR.cp949
Traceback (most recent call last):
  File "/Users/dongwon/dev/test.py", line 8, in <module>
    for l in f.readlines():
  File "/Users/dongwon/.pyenv/versions/3.10.2/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 18: invalid start byte

LANG=cp949 also didn't work. So, LANG seems like not effective in this case. And from 'locale' documentation it seems it always set encoding to UTF-8, so that wouldn't help. Well, maybe methods in this link(Korean document) seems works, but anyway code should be changed in this way.

kuna avatar Mar 19 '22 04:03 kuna