pyelftools icon indicating copy to clipboard operation
pyelftools copied to clipboard

Python 3: Some lables are bytes others are str.

Open BigJim opened this issue 6 years ago • 10 comments

With Python 3, some times labels are type(str) other times they are type(bytes). For instance with a DT_NEEDED tagged item it might be a bytes, other times it's might be a str.

Since they are labels they should be always a str never bytes. As it is, one must check and convert each label before they do any kind of string operation on the label et al.

BigJim avatar Apr 13 '18 17:04 BigJim

Hi, I want to solve the problem you raised. I wonder if you can provide more specific information. For example, I need a way to reproduce the problem or a binary, thanks.

junsooo avatar Apr 28 '18 15:04 junsooo

Sorry for the delay, I had my email notifications turned off. I ended up switching to Quarkslab's LIEF instead since that better fit my needs.

I was using Ubuntu 16.04 and the https://github.com/angr framework which uses pyelftools (without modification). Seems like any native Ubuntu executable had the same problem.

I'm pretty sure what needs to be done is to make sure all label endpoints do a str() to make sure they are a string and not a byte array before returning ELF containers.

BigJim avatar Jul 11 '18 18:07 BigJim

I agree in spirit, but won't that be a breaking change for those consumers that expect bytes in a certain context?

sevaa avatar Jan 14 '20 17:01 sevaa

They are labels though so it would make more sense if they were always strings. And at least the same to be consistent. Been a while, hard to remember all the context..

BigJim avatar Jan 15 '20 04:01 BigJim

I'd wait for @eliben to chime in.

sevaa avatar Jan 15 '20 14:01 sevaa

(hi! angr dev here)

unfortuantely the inconsistency between str and bytes seems to be embedded enough that some interfaces will give you either type depending on unknown factors. The only way I am able to use pyelftools without modification (though we do some things which border on monkeypatching) is with functions like this. whenever I contribute code here I try to keep things sane, but on some level this is an issue with the ELF format. String tables are technically just null-terminated bytestrings, but compilers which take unicode source code will encode names with utf-8. However, this isn't guaranteed by any means and a malicious program could have strings which are invalid utf-8 to try to crash the decoder.

It would be nice for anything that comes out of a string table to be a str instead of bytes, but it would require a lot of safeguards.

Related: https://github.com/eliben/pyelftools/issues/173

rhelmot avatar Feb 14 '20 22:02 rhelmot

.decode('utf-8', errors='ignore') is your friend.

sevaa avatar Feb 14 '20 22:02 sevaa

An additional thought is that this will be easier to handle once we drop Python 2 support; but that's not currently on the roadmap. At some future date, perhaps

eliben avatar Feb 15 '20 00:02 eliben

Related: #415

eliben avatar Jun 07 '22 22:06 eliben

Related: #286, #279

sevaa avatar Jun 09 '22 00:06 sevaa