cle icon indicating copy to clipboard operation
cle copied to clipboard

Fails to load PE binaries with non-utf-8 decodable bytes in section name

Open AlexVanMechelen opened this issue 2 years ago • 8 comments

Description

Loading a PE binary with non-utf-8 decodable bytes in the section name of one of its sections causes a crash here

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd2 in position 6: invalid continuation byte

Could add errors='ignore' flag to .decode() to drop non-utf-8 decodable bytes

AlexVanMechelen avatar Nov 08 '23 19:11 AlexVanMechelen

@rhelmot I've been thinking about this for a while. Should we use bytes instead of str for section and segment names?

ltfish avatar Nov 08 '23 19:11 ltfish

My question for OP is: are your section names encoded in some other encoding, or are they garbage?

rhelmot avatar Nov 08 '23 20:11 rhelmot

@rhelmot The section names are garbage. Some executable packers create such garbage section names, leading to the above error for all executables packed with them.

AlexVanMechelen avatar Nov 08 '23 21:11 AlexVanMechelen

Does any compiler support generating utf-8 section names? If so, I would recommend adding the error-replace utf-8 decoding. If not, Latin-1.

rhelmot avatar Nov 08 '23 23:11 rhelmot

@rhelmot Some malware intentionally makes their section names garbage. I don't think we want to fail to load those binaries in such cases.

ltfish avatar Nov 09 '23 00:11 ltfish

Neither of those solutions will fail with garbage bytes.

rhelmot avatar Nov 09 '23 00:11 rhelmot

Why don't we default to latin-1?

ltfish avatar Nov 09 '23 05:11 ltfish

That's why I asked the question about whether compilers let you generate utf8 section names manually

rhelmot avatar Nov 09 '23 05:11 rhelmot