edb-debugger icon indicating copy to clipboard operation
edb-debugger copied to clipboard

Preserve Analysis through reboot

Open AaronOpfer opened this issue 8 years ago • 10 comments

Analysis can be slow on some very large binaries. It would be nice if we could save the results of the analysis to disk.

To prevent loading incorrect analysis when binaries change, we can compare the analysis timestamp against the binary's filestamp at loadtime, and if the binary is newer we can discard the analysis.


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

AaronOpfer avatar Nov 24 '16 08:11 AaronOpfer

A hash sum seems more robust than a timestamp.

10110111 avatar Nov 24 '16 09:11 10110111

You're right, but it could be expensive on binaries with lots of embedded resources. Maybe it should be an opt-in.

AaronOpfer avatar Nov 29 '16 15:11 AaronOpfer

The analyzer already does an MD5 of every region it analyzes (in particular to detect changes), so that could be used directly. Fortunately, it hasn't proven to be particularly time consuming yet.

eteran avatar Nov 30 '16 20:11 eteran

I think the first step in this, would be the make the analysis data store addresses relative to the module/region base instead of absolute like it is currently. That would make saving/restoring much simpler when ASLR is involved.

eteran avatar Dec 01 '16 20:12 eteran

We could do that, but it would probably be better to just save the base address alongside the absolute addresses so that we can do corrections at load-time.

AaronOpfer avatar Dec 06 '16 16:12 AaronOpfer

Sure, that would work equally well.

eteran avatar Dec 06 '16 17:12 eteran

Using the md5 sum to determine whether analysis is still relevant might be a problem for binaries that use relocation tables. Relocation tables will cause a relocated binary to have a different hash each time.

AaronOpfer avatar Dec 21 '16 01:12 AaronOpfer

Well, I think that will generally be a problem for any solution that is based on "did the data in this region change". I am of course open to alternatives.

BTW. do you know if my push fixed #528 ?

eteran avatar Dec 21 '16 01:12 eteran

Any suggestions on what we should serialize? Serializing everything looks impractical since these function objects have vectors of basicblocks which have vectors of instructions. Literally writing all of the instructions into a file sounds pretty redundant and probably slower than real analysis.

It seems like the most benefit would come from reducing the most expensive parts of analysis. It seems like the fuzzy analysis and basic block steps essentially saves every function's start address, end address and reference counts (expensive) and then disassembles and saves all of those functions' instructions (cheap-ish). If we could serialize that expensive information, we might be able to more quickly recreate Function and BasicBlock objects and their disassembled instructions than we normally would.

Does this seem like a reasonable approach? I don't want to get too far off the deep end before I confirm this is reasonable.

AaronOpfer avatar Dec 21 '16 02:12 AaronOpfer

I'll take a look at it and get back to you. But we should probably lean more towards the "store too much" over potentially storing too little.

eteran avatar Dec 21 '16 02:12 eteran