rizin icon indicating copy to clipboard operation
rizin copied to clipboard

Resolve data relocations in 32bit position-independent binaries

Open XVilka opened this issue 4 years ago • 2 comments

Description

Modern distributions of GCC produce position-independent binaries in 32bit mode by default. It's implemented by calling a utility function like __x86.get_pc_thunk.ax which just returns the return address from the stack; then all the addressing is performed relatively to the obtained address. Rizin, however, doesn't recognize this pattern, so it's not easy to analyze even a simple helloworld statically:

┌ (fcn) main 60
│   int main (int argc, char **argv, char **envp);
│           ; var int var_8h @ ebp-0x8
│           ; arg int arg_4h @ esp+0x4
│           0x00001199      8d4c2404       lea ecx, [arg_4h]
│           0x0000119d      83e4f0         and esp, 0xfffffff0
│           0x000011a0      ff71fc         push dword [ecx - 4]
│           0x000011a3      55             push ebp
│           0x000011a4      89e5           mov ebp, esp
│           0x000011a6      53             push ebx
│           0x000011a7      51             push ecx
│           0x000011a8      e828000000     call sym.__x86.get_pc_thunk.ax
│           0x000011ad      052b2e0000     add eax, 0x2e2b             ; '+.'
│           0x000011b2      83ec0c         sub esp, 0xc
│           0x000011b5      8d9030e0ffff   lea edx, [eax - 0x1fd0]
│           0x000011bb      52             push edx                    ; const char *s
│           0x000011bc      89c3           mov ebx, eax
│           0x000011be      e86dfeffff     call sym.imp.puts           ; int puts(const char *s)
│           0x000011c3      83c410         add esp, 0x10
│           0x000011c6      b800000000     mov eax, 0
│           0x000011cb      8d65f8         lea esp, [var_8h]
│           0x000011ce      59             pop ecx
│           0x000011cf      5b             pop ebx
│           0x000011d0      5d             pop ebp
│           0x000011d1      8d61fc         lea esp, [ecx - 4]
└           0x000011d4      c3             ret

The value of edx prior to puts call is always str.Hello_World, which is hard to see without debugging the program.

Describe the solution you'd like

The *get_pc_thunk* family of functions (or maybe the pattern of returning the return address) could be treated in a special way, letting the analyzer know that they're used for EIP-relative addressing. It would let the analyzer resolve EIP-relative data references.

Additional context

Here's a screenshot of IDA, which recognizes the string literal (the offset of str.Hello_World is 0x2008):

image

The helloworld binary from the example

by @loskutov

Additional notes by @fabianMendez:

In android shared libraries (with PIC) references are calculated in a way which confuses radare2 Firsst, it sets a register to a base address (.got.plt VA). I've found two different methods:

call $+5 + pop reg (related to #11309)

0x001b7d36      e800000000     call 0x1b7d3b
0x001b7d3b      5b             pop ebx
0x001b7d3c      81c30d4c7b01   add ebx, 0x17b4c0d

call function which return $+5

0x00000c86      e8b5ffffff     call fcn.mov_ebx_esp
0x00000c8b      81c3e1220000   add ebx, 0x22e1

; fcn.mov_ebx_esp ();
0x00000c40      8b1c24         mov ebx, dword [esp]
0x00000c43      c3             ret

Then it uses this register across the function to get references to global variables or strings:

0x00001025      8d83f2e4ffff   lea eax, [ebx - 0x1b0e]
0x0000102b      890424         mov dword [esp], eax
0x0000102e      e86dfaffff     call sym.imp.opendir

I'd like to get register base cross references in shared libraries with PIC enabled

0x00001025      8d83f2e4ffff   lea eax, str.proc_self_id
0x0000102b      890424         mov dword [esp], eax
0x0000102e      e86dfaffff     call sym.imp.opendir

If a library has PIC enabled Rizin should not make references like this:

; "_traitsIS6_EEEEEEvRNS0_11timer_queueIT_EERKNSB_9time_typeERNSC_14per_timer_dataEPNS0_7wait_opE"  │                         
0x001b7d54 6806000100     push 0x10006
[0x001b7d2b]> ps @ 0x10006
_traitsIS6_EEEEEEvRNS0_11timer_queueIT_EERKNSB_9time_typeERNSC_14per_timer_dataEPNS0_7wait_opE

I think should be assumed addresses are gonna be calculated in some way

Reference binaries: libnative-lib.zip libtermux.zip

See the file parsing at

  • librz/bin/format/elf/
  • librz/bin/p/bin_elf.c

XVilka avatar Jan 08 '21 05:01 XVilka

@lxyAnnie This issue is asigned to you as RSoC micro-task, so you could work on this from now on. Feel free to ask questions and send Pull Requests.

In case for anyone who is not aware about RSoC.

CC @XVilka

fangxlmr avatar Mar 15 '21 01:03 fangxlmr

@lxyAnnie After commiting and pushing to your forked Rizin repo, a seperate Pull-Request to this repo is needed for some one else to review. Your commits will be merged to the mainline after Reviews. Merely commiting to your own repo and mentioning this issue will make no effect to this main repo at the end of the day.

fangxlmr avatar Mar 24 '21 05:03 fangxlmr