rizin
rizin copied to clipboard
Resolve data relocations in 32bit position-independent binaries
Description
Modern distributions of GCC produce position-independent binaries in 32bit mode by default. It's implemented by calling a utility function like __x86.get_pc_thunk.ax
which just returns the return address from the stack; then all the addressing is performed relatively to the obtained address. Rizin, however, doesn't recognize this pattern, so it's not easy to analyze even a simple helloworld statically:
┌ (fcn) main 60
│ int main (int argc, char **argv, char **envp);
│ ; var int var_8h @ ebp-0x8
│ ; arg int arg_4h @ esp+0x4
│ 0x00001199 8d4c2404 lea ecx, [arg_4h]
│ 0x0000119d 83e4f0 and esp, 0xfffffff0
│ 0x000011a0 ff71fc push dword [ecx - 4]
│ 0x000011a3 55 push ebp
│ 0x000011a4 89e5 mov ebp, esp
│ 0x000011a6 53 push ebx
│ 0x000011a7 51 push ecx
│ 0x000011a8 e828000000 call sym.__x86.get_pc_thunk.ax
│ 0x000011ad 052b2e0000 add eax, 0x2e2b ; '+.'
│ 0x000011b2 83ec0c sub esp, 0xc
│ 0x000011b5 8d9030e0ffff lea edx, [eax - 0x1fd0]
│ 0x000011bb 52 push edx ; const char *s
│ 0x000011bc 89c3 mov ebx, eax
│ 0x000011be e86dfeffff call sym.imp.puts ; int puts(const char *s)
│ 0x000011c3 83c410 add esp, 0x10
│ 0x000011c6 b800000000 mov eax, 0
│ 0x000011cb 8d65f8 lea esp, [var_8h]
│ 0x000011ce 59 pop ecx
│ 0x000011cf 5b pop ebx
│ 0x000011d0 5d pop ebp
│ 0x000011d1 8d61fc lea esp, [ecx - 4]
└ 0x000011d4 c3 ret
The value of edx prior to puts call is always str.Hello_World, which is hard to see without debugging the program.
Describe the solution you'd like
The *get_pc_thunk*
family of functions (or maybe the pattern of returning the return address) could be treated in a special way, letting the analyzer know that they're used for EIP-relative addressing. It would let the analyzer resolve EIP-relative data references.
Additional context
Here's a screenshot of IDA, which recognizes the string literal (the offset of str.Hello_World
is 0x2008):
The helloworld binary from the example
by @loskutov
Additional notes by @fabianMendez:
In android shared libraries (with PIC) references are calculated in a way which confuses radare2 Firsst, it sets a register to a base address (.got.plt VA). I've found two different methods:
call $+5 + pop reg (related to #11309)
0x001b7d36 e800000000 call 0x1b7d3b
0x001b7d3b 5b pop ebx
0x001b7d3c 81c30d4c7b01 add ebx, 0x17b4c0d
call function which return $+5
0x00000c86 e8b5ffffff call fcn.mov_ebx_esp
0x00000c8b 81c3e1220000 add ebx, 0x22e1
; fcn.mov_ebx_esp ();
0x00000c40 8b1c24 mov ebx, dword [esp]
0x00000c43 c3 ret
Then it uses this register across the function to get references to global variables or strings:
0x00001025 8d83f2e4ffff lea eax, [ebx - 0x1b0e]
0x0000102b 890424 mov dword [esp], eax
0x0000102e e86dfaffff call sym.imp.opendir
I'd like to get register base cross references in shared libraries with PIC enabled
0x00001025 8d83f2e4ffff lea eax, str.proc_self_id
0x0000102b 890424 mov dword [esp], eax
0x0000102e e86dfaffff call sym.imp.opendir
If a library has PIC enabled Rizin should not make references like this:
; "_traitsIS6_EEEEEEvRNS0_11timer_queueIT_EERKNSB_9time_typeERNSC_14per_timer_dataEPNS0_7wait_opE" │
0x001b7d54 6806000100 push 0x10006
[0x001b7d2b]> ps @ 0x10006
_traitsIS6_EEEEEEvRNS0_11timer_queueIT_EERKNSB_9time_typeERNSC_14per_timer_dataEPNS0_7wait_opE
I think should be assumed addresses are gonna be calculated in some way
Reference binaries: libnative-lib.zip libtermux.zip
See the file parsing at
-
librz/bin/format/elf/
-
librz/bin/p/bin_elf.c
@lxyAnnie This issue is asigned to you as RSoC micro-task, so you could work on this from now on. Feel free to ask questions and send Pull Requests.
In case for anyone who is not aware about RSoC.
CC @XVilka
@lxyAnnie After commiting and pushing to your forked Rizin repo, a seperate Pull-Request to this repo is needed for some one else to review. Your commits will be merged to the mainline after Reviews. Merely commiting to your own repo and mentioning this issue will make no effect to this main repo at the end of the day.