snowman icon indicating copy to clipboard operation
snowman copied to clipboard

main() not found

Open rofl0r opened this issue 7 years ago • 3 comments

this statically linked binary (700 KB), even though it has a named main symbol, doesn't have it included in nocode's output. from objdump -dr:

08048e5c <main>:

the spot from where it's usually possible to find main is the __libc_start_main call in _start, but here we find just its address 0x8048e5c:

__libc_start_main(0x8048e5c, __return_address(), esp1, __libc_csu_init, 0x80495e0, edx2, (reinterpret_cast<unsigned char>(esp1) & 0xfffffff0) - 4 - 4, eax3);

there's also no fun_8048e5c, so i'm clueless how to detect it.

rofl0r avatar Jun 04 '17 03:06 rofl0r

It's merged into ignore() function. I guess, there are no explicit calls to main(), and the decompiler does not know that a call to exit() does not return (because it does not know about noreturn functions at the moment at all), so, it thinks that ignore() function continues after the call.

To find the code for a function whose symbol name you know, select the symbol in the symbols view (Alt-Y), right click, Jump to address, select few instructions after that address.

yegord avatar Jun 04 '17 08:06 yegord

thanks for your reply.

there are no explicit calls to main()

does that matter ? imo if a function symbol is in the binary, it should be decompiled; and that's what i see with other binaries usually. for example a test program that has unused functions has them in snowman output too. in our case here there is of course a call to main, and it's passed from the entry point _start() as first argument (although in that case only as a hardcoded address) to the call to __libc_start_main(). what's odd though is that i can't see the first parameter used in __libc_start_main's decompiled code at all. maybe that is the bug. i suppose in the binary it's this spot that calls it:

 80490a2:       ff 54 24 60             call   *0x60(%esp)

To find the code for a function whose symbol name you know, select the symbol in the symbols view (Alt-Y)

that's good to know, thanks. however i'm using only nocode as i'm trying to automate the task of decompiling binaries and the manual cleanup afterwards, so using the GUI would defeat that purpose.

rofl0r avatar Jun 04 '17 13:06 rofl0r

does that matter ?

For the currently implemented function reconstruction algorithm — yes. See https://github.com/yegord/snowman/blob/master/src/nc/core/ir/FunctionsGenerator.cpp#L97

One could just do program.addCalledAddress() for each address of a function symbol somewhere in IRGenerator. Then, this particular problem would be gone. But it would be much nicer to have support for noreturn functions (which is on the todo list nobody is working on), which would fix this and other problems (like stack pointer computation in functions calling noreturn functions).

in our case here there is of course a call to main

The decompiler does not do constant propagation through function boundaries, so, it does not know that the stack argument being called points to main.

i can't see the first parameter used in __libc_start_main's decompiled code at all. maybe that is the bug.

The first argument is a pointer to main. main is called in here

                eax57 = reinterpret_cast<void**>(v9(a2, v21, v22, 0x80be77b));

v9 is defined here:

    v9 = reinterpret_cast<int32_t>(__return_address());

Clearly, v9 should be a1 instead. Apparently, the decompiler incorrectly estimates the value of esp at the point of the call (off by 4 bytes). You can investigate further, if you like. You can go along the path in the CFG from the place of the call and to the beginning of the function and see where the difference comes from: open the tree inspector, click on v9 at the place of the call, IR term, address, left, definitions, <1004:0..31>, Memory Location Access, statement, right, definitions, <1004:0..31>, and so on. For each IR term you can click on value properties to see the stack offset (relative to the beginning of the function) estimate. Somewhere it must become wrong. Most likely, there is a call to a function with a weird calling convention somewhere on the way.

i'm trying to automate the task of decompiling binaries and the manual cleanup afterwards

Good luck. People write PhD theses about it, still not solving the problem.

yegord avatar Jun 04 '17 20:06 yegord