edb-debugger Preserve Analysis through reboot

Analysis can be slow on some very large binaries. It would be nice if we could save the results of the analysis to disk.

To prevent loading incorrect analysis when binaries change, we can compare the analysis timestamp against the binary's filestamp at loadtime, and if the binary is newer we can discard the analysis.

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

Nov 24 '16 08:11 AaronOpfer

A hash sum seems more robust than a timestamp.

Nov 24 '16 09:11 10110111

You're right, but it could be expensive on binaries with lots of embedded resources. Maybe it should be an opt-in.

Nov 29 '16 15:11 AaronOpfer

The analyzer already does an MD5 of every region it analyzes (in particular to detect changes), so that could be used directly. Fortunately, it hasn't proven to be particularly time consuming yet.

Nov 30 '16 20:11 eteran

I think the first step in this, would be the make the analysis data store addresses relative to the module/region base instead of absolute like it is currently. That would make saving/restoring much simpler when ASLR is involved.

Dec 01 '16 20:12 eteran

We could do that, but it would probably be better to just save the base address alongside the absolute addresses so that we can do corrections at load-time.

Dec 06 '16 16:12 AaronOpfer

Sure, that would work equally well.

Dec 06 '16 17:12 eteran

Using the md5 sum to determine whether analysis is still relevant might be a problem for binaries that use relocation tables. Relocation tables will cause a relocated binary to have a different hash each time.

Dec 21 '16 01:12 AaronOpfer

Well, I think that will generally be a problem for any solution that is based on "did the data in this region change". I am of course open to alternatives.

BTW. do you know if my push fixed #528 ?

Dec 21 '16 01:12 eteran

Any suggestions on what we should serialize? Serializing everything looks impractical since these function objects have vectors of basicblocks which have vectors of instructions. Literally writing all of the instructions into a file sounds pretty redundant and probably slower than real analysis.

It seems like the most benefit would come from reducing the most expensive parts of analysis. It seems like the fuzzy analysis and basic block steps essentially saves every function's start address, end address and reference counts (expensive) and then disassembles and saves all of those functions' instructions (cheap-ish). If we could serialize that expensive information, we might be able to more quickly recreate Function and BasicBlock objects and their disassembled instructions than we normally would.

Does this seem like a reasonable approach? I don't want to get too far off the deep end before I confirm this is reasonable.

Dec 21 '16 02:12 AaronOpfer

I'll take a look at it and get back to you. But we should probably lean more towards the "store too much" over potentially storing too little.

Dec 21 '16 02:12 eteran

edb-debugger edb-debugger copied to clipboard

Preserve Analysis through reboot

edb-debugger
edb-debugger copied to clipboard