gdbgui icon indicating copy to clipboard operation
gdbgui copied to clipboard

ignore source file decoding errors

Open Hyphen90 opened this issue 4 years ago • 10 comments

Hello,

I'm debugging my C application and everything is working so far, as long as I'm debugging small source files. But when I have larger ones (e.g. 84428 bytes), your source window throws an error that it could not found the file. When I use the list command in the gdb console, it could show me the source code. And I have no problem with smaller files in the same location.

Thanks ahead!

Hyphen90 avatar Aug 19 '20 13:08 Hyphen90

Its not a problem with the file size. The files have some special chars (German ones) and python throws this error for example:

'utf-8' codec can't decode byte 0xe4 in position 15179: invalid continuation byte

When I remove it from the file, it can be loaded. Stackoverflow is full of such cases, perhaps you can come around it in the future.

Hyphen90 avatar Aug 19 '20 15:08 Hyphen90

The server uses the Python function open. It uses whatever encoding is defined for the system, and if it can't decode something, it raises an error. https://docs.python.org/3/library/functions.html#open

It sounds like your system encoding

> python -c "import locale; print(locale.getpreferredencoding())"

doesn't match that file.

In any case, a good fix for gdbgui will be to not raise errors when encoding issues are raised.

cs01 avatar Aug 22 '20 23:08 cs01

Thanks for your reply. The file is encoded in UTF-8 and the system encoding is also UTF-8.

But when gdbgui would "ignore" such errors and at least display the source, would help a lot.

Hyphen90 avatar Aug 23 '20 16:08 Hyphen90

Thanks for your reply. The file is encoded in UTF-8 and the system encoding is also UTF-8.

I wonder why the error is occuring then. Is the character invalid?

I'm planning to change

with open(path, "r") as f:

to

with open(path, "r", errors="replace") as f:

Do you think that will fix it?

cs01 avatar Aug 23 '20 17:08 cs01

Yes, sorry. I've converted an ANSI file with Notepad++ to UTF-8 with special chars and the error was thrown again. When I remove all "invalid" chars by hand its no problem. Then I can add new special chars without a problem.

I've added 'errors="replace"' to your open call on line 688 in backend.py and it works without a problem.

image

Thanks you very much! And gdbgui is a fantastic tool.

Hyphen90 avatar Aug 24 '20 08:08 Hyphen90

Thank you for the feedback!

cs01 avatar Aug 30 '20 03:08 cs01

I found a workaroud about this issue. gdbgui just can read UTF-8 files. The file which coding style is GBK can't be loaded. The patch is tested on centos7+gdbgui0.14.0.2. In /usr/local/lib/python3.6/site-packages/gdbgui/server/http_routes.py , change: with open(path, "r") as f: to : f=open(path,'rb+') content=f.read() source_encoding='utf-8' try: content.decode('utf-8').encode('utf-8') source_encoding='utf-8' except: try: content.decode('gbk').encode('utf-8') source_encoding='gbk' except: try: content.decode('gb2312').encode('utf-8') source_encoding='gb2312' except: try: content.decode('gb18030').encode('utf-8') source_encoding='gb18030' except: try: content.decode('big5').encode('utf-8') source_encoding='gb18030' except: content.decode('cp936').encode('utf-8') source_encoding='cp936' f.close() print("Codec of file is %s" % source_encoding) with codecs.open(path, "r", source_encoding) as f: And add import codecs infront of http_routes.py. @cs01 http_routes.py.txt

Fenglingang avatar Apr 15 '21 14:04 Fenglingang

So we do have a change, with feedback "works" which was not applied yet. I guess after the refactoring this should now go to https://github.com/cs01/gdbgui/blob/531d89890c0b4bd3bbf15d266b9ec25a2c7eebaa/gdbgui/server/http_routes.py#L55

Friendly ping.

Note: I see no issues with gdbui and iso-88591-15 encoded source files with German umlauts, but locale shows also that this is the configured language setup and according to the Python docs for open() the preferred encoding is the default which is used. So @Hyphen90 and @Fenglingang you may want to adjust the locale settings before opening gdbgui.

GitMensch avatar Dec 30 '21 21:12 GitMensch

Hi unfortunately I do not have the bandwidth to address this issue. This is a hobby project and between my full time job and new child I just don’t have time to work on this for the time being. Apologies if this inconveniences you.

cs01 avatar Dec 30 '21 22:12 cs01

I mostly wondered about the state, if someone knows if that new file would be the correct place I would give a PR a try... maybe I just try it in any case.

GitMensch avatar Dec 30 '21 22:12 GitMensch

I found a workaroud about this issue. gdbgui just can read UTF-8 files. The file which coding style is GBK can't be loaded. The patch is tested on centos7+gdbgui0.14.0.2. In /usr/local/lib/python3.6/site-packages/gdbgui/server/http_routes.py , change: with open(path, "r") as f: to : f=open(path,'rb+') content=f.read() source_encoding='utf-8' try: content.decode('utf-8').encode('utf-8') source_encoding='utf-8' except: try: content.decode('gbk').encode('utf-8') source_encoding='gbk' except: try: content.decode('gb2312').encode('utf-8') source_encoding='gb2312' except: try: content.decode('gb18030').encode('utf-8') source_encoding='gb18030' except: try: content.decode('big5').encode('utf-8') source_encoding='gb18030' except: content.decode('cp936').encode('utf-8') source_encoding='cp936' f.close() print("Codec of file is %s" % source_encoding) with codecs.open(path, "r", source_encoding) as f: And add import codecs infront of http_routes.py. @cs01 http_routes.py.txt

it works for me!

lutcraft avatar Oct 13 '23 08:10 lutcraft

gdbgui 0.15.2.0 has been released which should fix this issue

cs01 avatar Oct 18 '23 17:10 cs01