UTF-8 decode error for section names
Version and Platform (required):
- Binary Ninja Version: 4.2.5872-dev
- OS: Arch Linux
- OS Version: -
- CPU Architecture: x64
Bug Description: Can't access the sections property of BinaryView because a section name contains non utf-8 chars.
Steps To Reproduce:
Python 3.10.14 (main, May 8 2024, 21:13:45) [GCC 13.2.1 20240417] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from binaryninja import load
>>> bv = load("2d101cb5e071b57f48d93ad4cf1caa07199693d8073802209e6bf6e5a9188eb9")
[7254317515725976323:4120572419543564517 BinaryView.PEView error] Failed to parse COFF symbol table: invalid COFF string table size
[7254317515725976323:4120572419543564517 BinaryView.PEView warn] The number of Import_Directory_Table reported by the Data Directories is different from its correct amount. There are actually 3 Import_Directory_
Table in the file, but SizeOfImportTable reports 57. The PE parsing continues with the actual number of Import_Directory_Table
[7254317515725976323:4120572419543564517 BinaryView.PEView warn] Failed to parse relocation directory: read out of bounds
>>> bv.sections
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/binaryninja/python/binaryninja/binaryview.py", line 3265, in sections
result[core.BNSectionGetName(section_list[i])] = Section(section_handle)
File "/opt/binaryninja/python/binaryninja/_binaryninjacore.py", line 56897, in BNSectionGetName
string = str(pyNativeStr(casted))
File "/opt/binaryninja/python/binaryninja/_binaryninjacore.py", line 36, in pyNativeStr
return arg.decode('utf8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x85 in position 0: invalid start byte
Expected Behavior: Access the sections property for section information. We don't really need the section name, the ranges and flags would be enough.
Additional Information: Binary: 2d101cb5e071b57f48d93ad4cf1caa07199693d8073802209e6bf6e5a9188eb9.zip (Caution: Malware, PW: infected)
Thx for filing this bug report! I have personally ran into this before, but I forget to create an issue for it and just let it slip
Possible remediation we should make pyNativeStr catch the UnicodeDecodeError and return stringified bytes instead. This would allow all code to "just work" and the only special handling would be needed if this edge cases was hit. This would solve the issue where ever raw strings are recovered from binaryviews
fprintf(out, "def pyNativeStr(arg: AnyStr) -> str:\n");
fprintf(out, " if isinstance(arg, str):\n");
fprintf(out, " return arg\n");
fprintf(out, " else:\n");
fprintf(out, " return arg.decode('utf8')\n\n\n");
This was due to us failing to correctly handle an invalid string table when resolving section names.
Fixed in 4.2.6073-dev