binrec-tob icon indicating copy to clipboard operation
binrec-tob copied to clipboard

obstack support

Open ameily opened this issue 3 years ago • 2 comments

Multiple coreutils samples use the obstack functions for memory management. obstack accepts two function pointers for the allocator and free functions within obstack_init (a macro that just calls _obstack_begin). Typically malloc and free are passed in as these functions.

The problem is that external callbacks are not working within binrec, so the "callbacks" (malloc and free) are not lifted and the recovered binary segfaults in the call to _obstack_begin. Example of a recovered binary:

   0x090499f7 <+615>:	push   0x8048cd0 ; original free (not lifted)
   0x090499fc <+620>:	push   0x8048e40 ; original malloc (not lifted)
   0x09049a01 <+625>:	push   0x0
   0x09049a03 <+627>:	push   0x0
   0x09049a05 <+629>:	push   0x80521e0 ; struct obstack*
   0x09049a0a <+634>:	call   0x9049060 <_obstack_begin@plt>

We could add support for this specific use case of obstack within a new binrec lift pass:

  1. Identify calls to _obstack_begin
  2. Replace the arguments for the callbacks with addresses of free and malloc.

This pass would need to be run after library functions have been identified and externalized so that they can be resolved to an address (most likely via ptrtoint).

I've confirmed that three coreutils samples use obstack:

  • ls
  • tac
  • dircolors

ameily avatar Mar 30 '22 17:03 ameily

I think this is a solid plan for now. We should create an issue to go back and undo this (and the atexit) if / when we add support for external callbacks.

At worst, we run the risk of replacing custom allocators for obstack with malloc and free, which is not a common practice I think BinRec needs to worry about supporting at this point. This should not present a concern for soundness, more for performance.

michaelbrownuc avatar Mar 31 '22 14:03 michaelbrownuc

Related issue: trailofbits/binrec-prerelease#210 . Several of these binaries fail due to lack of support for callbacks. We need to come up with a method for handling this. Since callbacks are fairly rare, I wonder if we could do something similar for this as we do for libc functions. Specifically, we could create a separate data file with known callback functions. We could implement a new pass that would look for invocations of functions that register callbacks, and when these functions are called we replace the function pointer parameter with a reference to a trampoline function that calls the appropriate callback.

In this way, the library or OS gets a function pointer it can rely on, and we use existing BinRec infrastructure that identifies external function calls, inserts helper trampolines, etc to implement the callback. The downside is support is limited to functions we generate data for, but I think we could realistically enumerate them / add support for new libraries easily.

michaelbrownuc avatar Jul 27 '22 02:07 michaelbrownuc