obstack support
Multiple coreutils samples use the obstack functions for memory management. obstack accepts two function pointers for the allocator and free functions within obstack_init (a macro that just calls _obstack_begin). Typically malloc and free are passed in as these functions.
The problem is that external callbacks are not working within binrec, so the "callbacks" (malloc and free) are not lifted and the recovered binary segfaults in the call to _obstack_begin. Example of a recovered binary:
0x090499f7 <+615>: push 0x8048cd0 ; original free (not lifted)
0x090499fc <+620>: push 0x8048e40 ; original malloc (not lifted)
0x09049a01 <+625>: push 0x0
0x09049a03 <+627>: push 0x0
0x09049a05 <+629>: push 0x80521e0 ; struct obstack*
0x09049a0a <+634>: call 0x9049060 <_obstack_begin@plt>
We could add support for this specific use case of obstack within a new binrec lift pass:
- Identify calls to
_obstack_begin - Replace the arguments for the callbacks with addresses of
freeandmalloc.
This pass would need to be run after library functions have been identified and externalized so that they can be resolved to an address (most likely via ptrtoint).
I've confirmed that three coreutils samples use obstack:
lstacdircolors
I think this is a solid plan for now. We should create an issue to go back and undo this (and the atexit) if / when we add support for external callbacks.
At worst, we run the risk of replacing custom allocators for obstack with malloc and free, which is not a common practice I think BinRec needs to worry about supporting at this point. This should not present a concern for soundness, more for performance.
Related issue: trailofbits/binrec-prerelease#210 . Several of these binaries fail due to lack of support for callbacks. We need to come up with a method for handling this. Since callbacks are fairly rare, I wonder if we could do something similar for this as we do for libc functions. Specifically, we could create a separate data file with known callback functions. We could implement a new pass that would look for invocations of functions that register callbacks, and when these functions are called we replace the function pointer parameter with a reference to a trampoline function that calls the appropriate callback.
In this way, the library or OS gets a function pointer it can rely on, and we use existing BinRec infrastructure that identifies external function calls, inserts helper trampolines, etc to implement the callback. The downside is support is limited to functions we generate data for, but I think we could realistically enumerate them / add support for new libraries easily.