cle
cle copied to clipboard
Fails to load PE binaries with non-utf-8 decodable bytes in section name
Description
Loading a PE binary with non-utf-8 decodable bytes in the section name of one of its sections causes a crash here
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd2 in position 6: invalid continuation byte
Could add errors='ignore' flag to .decode() to drop non-utf-8 decodable bytes
@rhelmot I've been thinking about this for a while. Should we use bytes instead of str for section and segment names?
My question for OP is: are your section names encoded in some other encoding, or are they garbage?
@rhelmot The section names are garbage. Some executable packers create such garbage section names, leading to the above error for all executables packed with them.
Does any compiler support generating utf-8 section names? If so, I would recommend adding the error-replace utf-8 decoding. If not, Latin-1.
@rhelmot Some malware intentionally makes their section names garbage. I don't think we want to fail to load those binaries in such cases.
Neither of those solutions will fail with garbage bytes.
Why don't we default to latin-1?
That's why I asked the question about whether compilers let you generate utf8 section names manually