radare2
radare2 copied to clipboard
r2 is unable to identify functions within stripped assembled binaries (no C)
r2 is unable to identify functions within stripped assembled binaries (no C)
setup information
$
$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.9 (Maipo)
$
r2 version
$ r2 -v
radare2 4.6.0-git 25247 @ linux-x86-64 git.4.4.0-908-g91aebb6
commit: 91aebb64901285f5177151604816d270d0862039 build: 2020-11-09__01:33:51
$
Expected results
r2 should be able to identify functions in stripped binaries (assembled and linked, no C)
Actual results
r2 is unable to identify functions in stripped binaries (assembled and linked, no C)
Sample assembly code which has a function named "power"
$ cat power.s
.section .data
.section .text
.globl _start
_start:
pushl $3
pushl $2
call power
addl $8, %esp
pushl %eax
pushl $2
pushl $5
call power
addl $8, %esp
popl %ebx
addl %eax, %ebx
movl $1, %eax
int $0x80
.type power, @function
power:
pushl %ebp
movl %esp, %ebp
subl $4, %esp
movl 8(%ebp), %ebx
movl 12(%ebp), %ecx
movl %ebx, -4(%ebp)
power_loop_start:
cmpl $1, %ecx
je end_power
movl -4(%ebp), %eax
movl %eax, -4(%ebp)
decl %ecx
jmp power_loop_start
end_power:
movl -4(%ebp), %eax
movl %ebp, %esp
popl %ebp
ret
$
Assemble the code using GNU as and link it to create a binary
$ as power.s -o power.o --32
$
$ file power.o
power.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
$
$ ld power.o -o power -m elf_i386
$
$ file power
power: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, not stripped
$
$ ./power
$
Load the (un-stripped) binary within r2 and analyze, r2 is able to identify power function
$ r2 -A ./power
[x] Analyze all flags starting with sym. and entry0 (aa)
[x] Analyze function calls (aac)
[x] Analyze len bytes of instructions for references (aar)
[x] Check for vtables
[x] Type matching analysis for all functions (aaft)
[x] Propagate noreturn information
[x] Use -AA or aaaa to perform additional experimental analysis.
-- I love the smell of bugs in the morning.
[0x08048054]>
[0x08048054]> afl
0x08048054 1 35 entry0
0x08048077 4 36 sym.power <<<<<<<<<<<<<<<<< (power function)
[0x08048054]>
[0x08048054]> pdf @ entry0
;-- section..text:
;-- .text:
;-- _start:
;-- eip:
┌ 35: entry0 ();
│ ; var int32_t var_4h @ ebp-0x4
│ 0x08048054 6a03 pushl $3 ; 3 ; int32_t arg_ch ; [01] -r-x section size 71 named .text
│ 0x08048056 6a02 pushl $2 ; 2 ; int32_t arg_8h
│ 0x08048058 e81a000000 calll sym.power
│ 0x0804805d 83c408 addl $8, %esp
│ 0x08048060 50 pushl %eax
│ 0x08048061 6a02 pushl $2 ; 2 ; int32_t arg_ch
│ 0x08048063 6a05 pushl $5 ; 5 ; int32_t arg_8h
│ 0x08048065 e80d000000 calll sym.power
│ 0x0804806a 83c408 addl $8, %esp
│ 0x0804806d 5b popl %ebx
│ 0x0804806e 01c3 addl %eax, %ebx
│ 0x08048070 b801000000 movl $1, %eax
└ 0x08048075 cd80 int $0x80
[0x08048054]>
[0x08048054]> pdf @ sym.power
; CALL XREFS from entry0 @ 0x8048058, 0x8048065
┌ 36: sym.power (int32_t arg_8h, int32_t arg_ch);
│ ; var int32_t var_4h @ ebp-0x4
│ ; arg int32_t arg_8h @ ebp+0x8
│ ; arg int32_t arg_ch @ ebp+0xc
│ 0x08048077 55 pushl %ebp
│ 0x08048078 89e5 movl %esp, %ebp
│ 0x0804807a 83ec04 subl $4, %esp
│ 0x0804807d 8b5d08 movl arg_8h, %ebx
│ 0x08048080 8b4d0c movl arg_ch, %ecx
│ 0x08048083 895dfc movl %ebx, -4(%ebp)
│ ; CODE XREF from sym.power @ 0x8048092
│ ;-- power_loop_start:
│ ┌─> 0x08048086 83f901 cmpl $1, %ecx ; 1
│ ┌──< 0x08048089 7409 je loc.end_power
│ │╎ 0x0804808b 8b45fc movl var_4h, %eax
│ │╎ 0x0804808e 8945fc movl %eax, var_4h
│ │╎ 0x08048091 49 decl %ecx
│ │└─< 0x08048092 ebf2 jmp loc.power_loop_start
│ │ ; CODE XREF from sym.power @ 0x8048089
│ │ ;-- end_power:
│ └──> 0x08048094 8b45fc movl var_4h, %eax
│ 0x08048097 89ec movl %ebp, %esp
│ 0x08048099 5d popl %ebp
└ 0x0804809a c3 retl
[0x08048054]> q
$
Now strip the binary
$
$ cp power power.strip
$
$ strip power.strip
$
$ file power.strip
power.strip: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped
$
Load the stripped binary within r2, list functions, r2 doesnt show an equivalent function name (with address) in place of "power" function
$
$ r2 -A ./power.strip
[x] Analyze all flags starting with sym. and entry0 (aa)
[x] Analyze function calls (aac)
[x] Analyze len bytes of instructions for references (aar)
[x] Check for vtables
[x] Type matching analysis for all functions (aaft)
[x] Propagate noreturn information
[x] Use -AA or aaaa to perform additional experimental analysis.
-- Yo dawg!
[0x08048054]>
[0x08048054]> afl
0x08048054 4 71 entry0 <<<<<<<<<<<<<<< (no power function or equivalent)
[0x08048054]>
[0x08048054]> pdf @ entry0
;-- section..text:
;-- eip:
┌ 71: entry0 (int32_t arg_8h, int32_t arg_ch);
│ ; var int32_t var_4h @ ebp-0x4
│ ; arg int32_t arg_8h @ ebp+0x8
│ ; arg int32_t arg_ch @ ebp+0xc
│ 0x08048054 6a03 pushl $3 ; 3 ; [01] -r-x section size 71 named .text
│ 0x08048056 6a02 pushl $2 ; 2
│ 0x08048058 e81a000000 calll 0x8048077 <<<<<<<<< (calls to power function)
│ 0x0804805d 83c408 addl $8, %esp
│ 0x08048060 50 pushl %eax
│ 0x08048061 6a02 pushl $2 ; 2
│ 0x08048063 6a05 pushl $5 ; 5
│ 0x08048065 e80d000000 calll 0x8048077 <<<<<<<<< (calls to power function)
│ 0x0804806a 83c408 addl $8, %esp
│ 0x0804806d 5b popl %ebx
│ 0x0804806e 01c3 addl %eax, %ebx
│ 0x08048070 b801000000 movl $1, %eax
│ 0x08048075 cd80 int $0x80
│ ; CALL XREFS from entry0 @ 0x8048058, 0x8048065
│ 0x08048077 55 pushl %ebp <<<<<<<<<<< (start of power function)
│ 0x08048078 89e5 movl %esp, %ebp
│ 0x0804807a 83ec04 subl $4, %esp
│ 0x0804807d 8b5d08 movl arg_8h, %ebx
│ 0x08048080 8b4d0c movl arg_ch, %ecx
│ 0x08048083 895dfc movl %ebx, var_4h
│ ; CODE XREF from entry0 @ 0x8048092
│ ┌─> 0x08048086 83f901 cmpl $1, %ecx ; 1
│ ┌──< 0x08048089 7409 je 0x8048094
│ │╎ 0x0804808b 8b45fc movl var_4h, %eax
│ │╎ 0x0804808e 8945fc movl %eax, var_4h
│ │╎ 0x08048091 49 decl %ecx
│ │└─< 0x08048092 ebf2 jmp 0x8048086
│ │ ; CODE XREF from entry0 @ 0x8048089
│ └──> 0x08048094 8b45fc movl var_4h, %eax
│ 0x08048097 89ec movl %ebp, %esp
│ 0x08048099 5d popl %ebp
└ 0x0804809a c3 retl
[0x08048054]>
Aaaa is only suposed to be used in standard binaries.
For your case i would use e anal.hasnext=1;afr or aac
Tried the suggested steps, did not make a difference
Stripped binary from before
$ file power.strip
power.strip: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped
$
Loading into r2 and enabling anal.hasnext
$ r2 ./power.strip
-- This incident will be reported
[0x08048054]>
[0x08048054]> e | grep anal.hasnext
anal.hasnext = false
[0x08048054]>
[0x08048054]> e anal.hasnext = 1
[0x08048054]>
[0x08048054]> e | grep anal.hasnext
anal.hasnext = true
[0x08048054]>
Running both afr
and aac
, however afl
still does not detect and list the function
[0x08048054]> afr
[0x08048054]>
[0x08048054]> afl
0x08048054 4 71 entry0
[0x08048054]>
[0x08048054]> aac
[0x08048054]>
[0x08048054]> afl
0x08048054 4 71 entry0
[0x08048054]>
We can see the function is being called multiple times in below disassembly of entry0
[0x08048054]> pdf @ entry0
;-- section..text:
;-- eip:
┌ 71: entry0 (int32_t arg_8h, int32_t arg_ch);
│ ; var int32_t var_4h @ ebp-0x4
│ ; arg int32_t arg_8h @ ebp+0x8
│ ; arg int32_t arg_ch @ ebp+0xc
│ 0x08048054 6a03 pushl $3 ; 3 ; [01] -r-x section size 71 named .text
│ 0x08048056 6a02 pushl $2 ; 2
│ 0x08048058 e81a000000 calll 0x8048077 <<<<<<<< function call
│ 0x0804805d 83c408 addl $8, %esp
│ 0x08048060 50 pushl %eax
│ 0x08048061 6a02 pushl $2 ; 2
│ 0x08048063 6a05 pushl $5 ; 5
│ 0x08048065 e80d000000 calll 0x8048077 <<<<<<<< function call
│ 0x0804806a 83c408 addl $8, %esp
│ 0x0804806d 5b popl %ebx
│ 0x0804806e 01c3 addl %eax, %ebx
│ 0x08048070 b801000000 movl $1, %eax
│ 0x08048075 cd80 int $0x80
│ ; CALL XREFS from entry0 @ 0x8048054, 0x8048058, 0x8048065
│ 0x08048077 55 pushl %ebp
│ 0x08048078 89e5 movl %esp, %ebp
│ 0x0804807a 83ec04 subl $4, %esp
│ 0x0804807d 8b5d08 movl arg_8h, %ebx
│ 0x08048080 8b4d0c movl arg_ch, %ecx
│ 0x08048083 895dfc movl %ebx, var_4h
│ ; CODE XREFS from entry0 @ 0x8048054, 0x8048092
│ ┌─> 0x08048086 83f901 cmpl $1, %ecx ; 1
│ ┌──< 0x08048089 7409 je 0x8048094
│ │╎ 0x0804808b 8b45fc movl var_4h, %eax
│ │╎ 0x0804808e 8945fc movl %eax, var_4h
│ │╎ 0x08048091 49 decl %ecx
│ │└─< 0x08048092 ebf2 jmp 0x8048086
│ └──> 0x08048094 8b45fc movl var_4h, %eax
│ 0x08048097 89ec movl %ebp, %esp
│ 0x08048099 5d popl %ebp
└ 0x0804809a c3 retl
[0x08048054]>
The issue is that exit
syscall (mov eax, 1; int 0x80
) is not determined as a end of function and r2 stops analysis of function at 0x08048054
(entry0
) only after reaching ret
instruction at 0x0804809a
(which is actually the end of function power
). In case of non-stripped binary function power
is created forcefully during aa
since there are a symbol at 0x08048077
. Thats why you see there 2 functions.
You can just call af @ entry0
on non-stripped and stripped binaries to ensure that the issue is caused by this syscall.
@pelijah Thank you for the detailed information, I did not take the exit
syscall into account.
My understanding was that function prologue and epilogue instructions (below) should be enough to identify a function from stripped binary. However since this is a handwritten binary it shouldn't matter that much (other than CTF's), most of the (compiled) standard binaries will have required information in place.
So I think this isn't really a bug and can be closed ?
# function entry
pushl %ebp
movl %ebp, %esp
# function exit
popl %ebp
movl %esp, %ebp
It actually is a bug, because the exit syscall in the simple form as movl $1, %eax; int $0x80
should be detected and interpreted as a no_return function.