pyelftools
pyelftools copied to clipboard
Python 3: Some lables are bytes others are str.
With Python 3, some times labels are type(str) other times they are type(bytes). For instance with a DT_NEEDED tagged item it might be a bytes, other times it's might be a str.
Since they are labels they should be always a str never bytes. As it is, one must check and convert each label before they do any kind of string operation on the label et al.
Hi, I want to solve the problem you raised. I wonder if you can provide more specific information. For example, I need a way to reproduce the problem or a binary, thanks.
Sorry for the delay, I had my email notifications turned off. I ended up switching to Quarkslab's LIEF instead since that better fit my needs.
I was using Ubuntu 16.04 and the https://github.com/angr framework which uses pyelftools (without modification). Seems like any native Ubuntu executable had the same problem.
I'm pretty sure what needs to be done is to make sure all label endpoints do a str() to make sure they are a string and not a byte array before returning ELF containers.
I agree in spirit, but won't that be a breaking change for those consumers that expect bytes in a certain context?
They are labels though so it would make more sense if they were always strings. And at least the same to be consistent. Been a while, hard to remember all the context..
I'd wait for @eliben to chime in.
(hi! angr dev here)
unfortuantely the inconsistency between str and bytes seems to be embedded enough that some interfaces will give you either type depending on unknown factors. The only way I am able to use pyelftools without modification (though we do some things which border on monkeypatching) is with functions like this. whenever I contribute code here I try to keep things sane, but on some level this is an issue with the ELF format. String tables are technically just null-terminated bytestrings, but compilers which take unicode source code will encode names with utf-8. However, this isn't guaranteed by any means and a malicious program could have strings which are invalid utf-8 to try to crash the decoder.
It would be nice for anything that comes out of a string table to be a str instead of bytes, but it would require a lot of safeguards.
Related: https://github.com/eliben/pyelftools/issues/173
.decode('utf-8', errors='ignore')
is your friend.
An additional thought is that this will be easier to handle once we drop Python 2 support; but that's not currently on the roadmap. At some future date, perhaps
Related: #415
Related: #286, #279