binaryninja-api UTF-8 decode error for section names

Version and Platform (required):

Binary Ninja Version: 4.2.5872-dev
OS: Arch Linux
OS Version: -
CPU Architecture: x64

Bug Description: Can't access the sections property of BinaryView because a section name contains non utf-8 chars.

Steps To Reproduce:

Python 3.10.14 (main, May  8 2024, 21:13:45) [GCC 13.2.1 20240417] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from binaryninja import load
>>> bv = load("2d101cb5e071b57f48d93ad4cf1caa07199693d8073802209e6bf6e5a9188eb9")
[7254317515725976323:4120572419543564517 BinaryView.PEView error] Failed to parse COFF symbol table: invalid COFF string table size
[7254317515725976323:4120572419543564517 BinaryView.PEView warn] The number of Import_Directory_Table reported by the Data Directories is different from its correct amount. There are actually 3 Import_Directory_
Table in the file, but SizeOfImportTable reports 57. The PE parsing continues with the actual number of Import_Directory_Table
[7254317515725976323:4120572419543564517 BinaryView.PEView warn] Failed to parse relocation directory: read out of bounds
>>> bv.sections
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/binaryninja/python/binaryninja/binaryview.py", line 3265, in sections
    result[core.BNSectionGetName(section_list[i])] = Section(section_handle)
  File "/opt/binaryninja/python/binaryninja/_binaryninjacore.py", line 56897, in BNSectionGetName
    string = str(pyNativeStr(casted))
  File "/opt/binaryninja/python/binaryninja/_binaryninjacore.py", line 36, in pyNativeStr
    return arg.decode('utf8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x85 in position 0: invalid start byte

Expected Behavior: Access the sections property for section information. We don't really need the section name, the ranges and flags would be enough.

Additional Information: Binary: 2d101cb5e071b57f48d93ad4cf1caa07199693d8073802209e6bf6e5a9188eb9.zip (Caution: Malware, PW: infected)

Aug 14 '24 13:08 NeoQuix

Thx for filing this bug report! I have personally ran into this before, but I forget to create an issue for it and just let it slip

Aug 14 '24 13:08 xusheng6

Possible remediation we should make pyNativeStr catch the UnicodeDecodeError and return stringified bytes instead. This would allow all code to "just work" and the only special handling would be needed if this edge cases was hit. This would solve the issue where ever raw strings are recovered from binaryviews

	fprintf(out, "def pyNativeStr(arg: AnyStr) -> str:\n");
	fprintf(out, "	if isinstance(arg, str):\n");
	fprintf(out, "		return arg\n");
	fprintf(out, "	else:\n");
	fprintf(out, "		return arg.decode('utf8')\n\n\n");

Aug 20 '24 14:08 plafosse

This was due to us failing to correctly handle an invalid string table when resolving section names. Fixed in 4.2.6073-dev

Sep 16 '24 16:09 negasora