radare2 icon indicating copy to clipboard operation
radare2 copied to clipboard

r2 is unable to identify functions within stripped assembled binaries (no C)

Open kamathe opened this issue 4 years ago • 6 comments

r2 is unable to identify functions within stripped assembled binaries (no C)

setup information

$ 
$ cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.9 (Maipo)
$ 

r2 version

$ r2 -v
radare2 4.6.0-git 25247 @ linux-x86-64 git.4.4.0-908-g91aebb6
commit: 91aebb64901285f5177151604816d270d0862039 build: 2020-11-09__01:33:51
$ 

Expected results

r2 should be able to identify functions in stripped binaries (assembled and linked, no C)

Actual results

r2 is unable to identify functions in stripped binaries (assembled and linked, no C)

Sample assembly code which has a function named "power"

$ cat power.s
.section .data

.section .text

.globl _start
_start:
 pushl $3
 pushl $2
 call power
 addl $8, %esp
 pushl %eax
 pushl $2
 pushl $5
 call power
 addl $8, %esp
 popl %ebx
 addl %eax, %ebx
 movl $1, %eax
 int $0x80


.type power, @function
power:
 pushl %ebp
 movl %esp, %ebp
 subl $4, %esp
 movl 8(%ebp), %ebx
 movl 12(%ebp), %ecx
 movl %ebx, -4(%ebp)

power_loop_start:
 cmpl $1, %ecx
 je end_power
 movl -4(%ebp), %eax
 movl %eax, -4(%ebp)
 decl %ecx
 jmp power_loop_start

end_power:
 movl -4(%ebp), %eax
 movl %ebp, %esp
 popl %ebp
 ret
$

Assemble the code using GNU as and link it to create a binary

$ as power.s -o power.o --32
$ 
$ file power.o
power.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
$ 
$ ld power.o -o power -m elf_i386
$ 
$ file power
power: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, not stripped
$ 
$ ./power 
$

Load the (un-stripped) binary within r2 and analyze, r2 is able to identify power function

$ r2 -A ./power
[x] Analyze all flags starting with sym. and entry0 (aa)
[x] Analyze function calls (aac)
[x] Analyze len bytes of instructions for references (aar)
[x] Check for vtables
[x] Type matching analysis for all functions (aaft)
[x] Propagate noreturn information
[x] Use -AA or aaaa to perform additional experimental analysis.
 -- I love the smell of bugs in the morning.
[0x08048054]> 
[0x08048054]> afl
0x08048054    1 35           entry0
0x08048077    4 36           sym.power                 <<<<<<<<<<<<<<<<< (power function)
[0x08048054]> 
[0x08048054]> pdf @ entry0
            ;-- section..text:
            ;-- .text:
            ;-- _start:
            ;-- eip:
┌ 35: entry0 ();
│           ; var int32_t var_4h @ ebp-0x4
│           0x08048054      6a03           pushl $3                    ; 3 ; int32_t arg_ch ; [01] -r-x section size 71 named .text
│           0x08048056      6a02           pushl $2                    ; 2 ; int32_t arg_8h
│           0x08048058      e81a000000     calll sym.power
│           0x0804805d      83c408         addl $8, %esp
│           0x08048060      50             pushl %eax
│           0x08048061      6a02           pushl $2                    ; 2 ; int32_t arg_ch
│           0x08048063      6a05           pushl $5                    ; 5 ; int32_t arg_8h
│           0x08048065      e80d000000     calll sym.power
│           0x0804806a      83c408         addl $8, %esp
│           0x0804806d      5b             popl %ebx
│           0x0804806e      01c3           addl %eax, %ebx
│           0x08048070      b801000000     movl $1, %eax
└           0x08048075      cd80           int $0x80
[0x08048054]> 
[0x08048054]> pdf @ sym.power
            ; CALL XREFS from entry0 @ 0x8048058, 0x8048065
┌ 36: sym.power (int32_t arg_8h, int32_t arg_ch);
│           ; var int32_t var_4h @ ebp-0x4
│           ; arg int32_t arg_8h @ ebp+0x8
│           ; arg int32_t arg_ch @ ebp+0xc
│           0x08048077      55             pushl %ebp
│           0x08048078      89e5           movl %esp, %ebp
│           0x0804807a      83ec04         subl $4, %esp
│           0x0804807d      8b5d08         movl arg_8h, %ebx
│           0x08048080      8b4d0c         movl arg_ch, %ecx
│           0x08048083      895dfc         movl %ebx, -4(%ebp)
│           ; CODE XREF from sym.power @ 0x8048092
│           ;-- power_loop_start:
│       ┌─> 0x08048086      83f901         cmpl $1, %ecx               ; 1
│      ┌──< 0x08048089      7409           je loc.end_power
│      │╎   0x0804808b      8b45fc         movl var_4h, %eax
│      │╎   0x0804808e      8945fc         movl %eax, var_4h
│      │╎   0x08048091      49             decl %ecx
│      │└─< 0x08048092      ebf2           jmp loc.power_loop_start
│      │    ; CODE XREF from sym.power @ 0x8048089
│      │    ;-- end_power:
│      └──> 0x08048094      8b45fc         movl var_4h, %eax
│           0x08048097      89ec           movl %ebp, %esp
│           0x08048099      5d             popl %ebp
└           0x0804809a      c3             retl
[0x08048054]> q
$ 

Now strip the binary

$ 
$ cp power power.strip
$ 
$ strip power.strip 
$ 
$ file power.strip 
power.strip: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped
$ 

Load the stripped binary within r2, list functions, r2 doesnt show an equivalent function name (with address) in place of "power" function

$ 
$ r2 -A ./power.strip 
[x] Analyze all flags starting with sym. and entry0 (aa)
[x] Analyze function calls (aac)
[x] Analyze len bytes of instructions for references (aar)
[x] Check for vtables
[x] Type matching analysis for all functions (aaft)
[x] Propagate noreturn information
[x] Use -AA or aaaa to perform additional experimental analysis.
 -- Yo dawg!
[0x08048054]> 
[0x08048054]> afl
0x08048054    4 71           entry0               <<<<<<<<<<<<<<< (no power function or equivalent)
[0x08048054]> 
[0x08048054]> pdf @ entry0
            ;-- section..text:
            ;-- eip:
┌ 71: entry0 (int32_t arg_8h, int32_t arg_ch);
│           ; var int32_t var_4h @ ebp-0x4
│           ; arg int32_t arg_8h @ ebp+0x8
│           ; arg int32_t arg_ch @ ebp+0xc
│           0x08048054      6a03           pushl $3                    ; 3 ; [01] -r-x section size 71 named .text
│           0x08048056      6a02           pushl $2                    ; 2
│           0x08048058      e81a000000     calll 0x8048077          <<<<<<<<< (calls to power function)
│           0x0804805d      83c408         addl $8, %esp
│           0x08048060      50             pushl %eax
│           0x08048061      6a02           pushl $2                    ; 2
│           0x08048063      6a05           pushl $5                    ; 5
│           0x08048065      e80d000000     calll 0x8048077          <<<<<<<<< (calls to power function)
│           0x0804806a      83c408         addl $8, %esp
│           0x0804806d      5b             popl %ebx
│           0x0804806e      01c3           addl %eax, %ebx
│           0x08048070      b801000000     movl $1, %eax
│           0x08048075      cd80           int $0x80
│           ; CALL XREFS from entry0 @ 0x8048058, 0x8048065
│           0x08048077      55             pushl %ebp                    <<<<<<<<<<< (start of power function)
│           0x08048078      89e5           movl %esp, %ebp
│           0x0804807a      83ec04         subl $4, %esp
│           0x0804807d      8b5d08         movl arg_8h, %ebx
│           0x08048080      8b4d0c         movl arg_ch, %ecx
│           0x08048083      895dfc         movl %ebx, var_4h
│           ; CODE XREF from entry0 @ 0x8048092
│       ┌─> 0x08048086      83f901         cmpl $1, %ecx               ; 1
│      ┌──< 0x08048089      7409           je 0x8048094
│      │╎   0x0804808b      8b45fc         movl var_4h, %eax
│      │╎   0x0804808e      8945fc         movl %eax, var_4h
│      │╎   0x08048091      49             decl %ecx
│      │└─< 0x08048092      ebf2           jmp 0x8048086
│      │    ; CODE XREF from entry0 @ 0x8048089
│      └──> 0x08048094      8b45fc         movl var_4h, %eax
│           0x08048097      89ec           movl %ebp, %esp
│           0x08048099      5d             popl %ebp
└           0x0804809a      c3             retl
[0x08048054]> 

kamathe avatar Nov 21 '20 06:11 kamathe

Aaaa is only suposed to be used in standard binaries.

For your case i would use e anal.hasnext=1;afr or aac

trufae avatar Nov 21 '20 15:11 trufae

Tried the suggested steps, did not make a difference

Stripped binary from before

$ file power.strip 
power.strip: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped
$ 

Loading into r2 and enabling anal.hasnext

$ r2 ./power.strip
 -- This incident will be reported
[0x08048054]> 
[0x08048054]> e | grep anal.hasnext
anal.hasnext = false
[0x08048054]> 
[0x08048054]> e anal.hasnext = 1
[0x08048054]> 
[0x08048054]> e | grep anal.hasnext
anal.hasnext = true
[0x08048054]> 

Running both afr and aac, however afl still does not detect and list the function

[0x08048054]> afr
[0x08048054]> 
[0x08048054]> afl
0x08048054    4 71           entry0
[0x08048054]> 
[0x08048054]> aac
[0x08048054]> 
[0x08048054]> afl
0x08048054    4 71           entry0
[0x08048054]> 

We can see the function is being called multiple times in below disassembly of entry0

[0x08048054]> pdf @ entry0
            ;-- section..text:
            ;-- eip:
┌ 71: entry0 (int32_t arg_8h, int32_t arg_ch);
│           ; var int32_t var_4h @ ebp-0x4
│           ; arg int32_t arg_8h @ ebp+0x8
│           ; arg int32_t arg_ch @ ebp+0xc
│           0x08048054      6a03           pushl $3                    ; 3 ; [01] -r-x section size 71 named .text
│           0x08048056      6a02           pushl $2                    ; 2
│           0x08048058      e81a000000     calll 0x8048077      <<<<<<<< function call
│           0x0804805d      83c408         addl $8, %esp
│           0x08048060      50             pushl %eax
│           0x08048061      6a02           pushl $2                    ; 2
│           0x08048063      6a05           pushl $5                    ; 5
│           0x08048065      e80d000000     calll 0x8048077      <<<<<<<< function call
│           0x0804806a      83c408         addl $8, %esp
│           0x0804806d      5b             popl %ebx
│           0x0804806e      01c3           addl %eax, %ebx
│           0x08048070      b801000000     movl $1, %eax
│           0x08048075      cd80           int $0x80
│           ; CALL XREFS from entry0 @ 0x8048054, 0x8048058, 0x8048065
│           0x08048077      55             pushl %ebp
│           0x08048078      89e5           movl %esp, %ebp
│           0x0804807a      83ec04         subl $4, %esp
│           0x0804807d      8b5d08         movl arg_8h, %ebx
│           0x08048080      8b4d0c         movl arg_ch, %ecx
│           0x08048083      895dfc         movl %ebx, var_4h
│           ; CODE XREFS from entry0 @ 0x8048054, 0x8048092
│       ┌─> 0x08048086      83f901         cmpl $1, %ecx               ; 1
│      ┌──< 0x08048089      7409           je 0x8048094
│      │╎   0x0804808b      8b45fc         movl var_4h, %eax
│      │╎   0x0804808e      8945fc         movl %eax, var_4h
│      │╎   0x08048091      49             decl %ecx
│      │└─< 0x08048092      ebf2           jmp 0x8048086
│      └──> 0x08048094      8b45fc         movl var_4h, %eax
│           0x08048097      89ec           movl %ebp, %esp
│           0x08048099      5d             popl %ebp
└           0x0804809a      c3             retl
[0x08048054]> 

kamathe avatar Nov 23 '20 08:11 kamathe

The issue is that exit syscall (mov eax, 1; int 0x80) is not determined as a end of function and r2 stops analysis of function at 0x08048054 (entry0) only after reaching ret instruction at 0x0804809a (which is actually the end of function power). In case of non-stripped binary function power is created forcefully during aa since there are a symbol at 0x08048077. Thats why you see there 2 functions.

pelijah avatar Nov 23 '20 19:11 pelijah

You can just call af @ entry0 on non-stripped and stripped binaries to ensure that the issue is caused by this syscall.

pelijah avatar Nov 23 '20 19:11 pelijah

@pelijah Thank you for the detailed information, I did not take the exit syscall into account. My understanding was that function prologue and epilogue instructions (below) should be enough to identify a function from stripped binary. However since this is a handwritten binary it shouldn't matter that much (other than CTF's), most of the (compiled) standard binaries will have required information in place. So I think this isn't really a bug and can be closed ?

# function entry
pushl %ebp
movl %ebp, %esp


# function exit
popl %ebp
movl %esp, %ebp

kamathe avatar Nov 24 '20 12:11 kamathe

It actually is a bug, because the exit syscall in the simple form as movl $1, %eax; int $0x80 should be detected and interpreted as a no_return function.

ret2libc avatar Nov 24 '20 13:11 ret2libc