cle
cle copied to clipboard
Profiling CLE, pyelftools, and pefile
Loading binaries is taking longer and longer since recent updates in CLE, pyelftools, and pefile. Profiling them is the first step to make things faster.
Here are some preliminary findings:
- the hottest functions for elf loading are clemory.load and pyelftools struct._parse
- For the former I was able to get a few milliseconds out of it
- The latter is going to be very hard. pyelftools has a very intricate struct parsing mechanism and I can't imagine making any changes to it without bringing down a house of cards. The one change we could make which I could see improving things is somehow getting pyelftools to directly use clemory.unpack_word for its word unpacking when it is using a clemory as a stream, instead of reading the word out as bytes and then unpacking that. I have no idea what percentage of the struct parsing is done over a clemory vs the binary stream, so take this with a big old grain of salt.
- how much time is spent on the various aspects of ELF parsing will obviously vary from binary to binary, but on the one I was testing on, relocation parsing was the most intensive. Note that this is just the parsing, not performing relocations, which actually takes relatively little by comparison. Because of this, another change I made was to disable relocation parsing when we disable relocation performing. This removes our ability to introspect into a binary's relocations without also performing them, but imo this is an okay tradeoff considering it is a noticeable speed improvement for large binaries.
I profiled PE loading back in summer 2017 and found that the same thing applied to pefile as it does to pyelftools - the hot functions are all struct parsing and this is already highly optimized. The big difference between our use of pefile vs pyelftools is that we use pefile as much more of a monolity, whereas we use pyelftools as a parsing toolkit. It might be possible to remove some unnecessary parsing if we look more carefully into how to use pefile efficiently.
Are you using load_debug_info=True? If so, are you using the latest pyelftools master? Recently a PR added a cache for DIU I believe, which sped up DWARF loading for me a lot.
I was thinking of monkeypatching the struct loading code in pyelftools in CLE using a C-backed implementation. What do you think?
all of my tests were with load_debug_info=False. I think your idea could maybe work but we would need to read the entire file into memory first and I don't really know how we would keep track of that.
also which level of abstraction were you thinking of monkeypatching pyelftools at? I can't seem to find a level in between "redo the whole gigantic mess" and "so small I don't think it would help anything"
I'm thinking of moving elftools/common/construct_utils.py into C.
This issue has been marked as stale because it has no recent activity. Please comment or add the pinned tag to prevent this issue from being closed.
One of the timeout binaries that we definitely want to be able to load: asterisk.zip