bcc
bcc copied to clipboard
libbpf-tools: Fix trace_helper symbol search bug
dso__find_sym function considers elf file is not stripped. So it could find wrong symbol if the target elf file is stripped. Consider if the start address of each function 'foo', 'bar', 'baz' is '0x1000', '0x4000', '0x9000' respectively. And 'bar' stripped since it's a static function. If user wants to find the symbol of address '0x4080', user expected result will be 'bar', and dso__find_sym function expected result will be NULL, but actual result is 'foo'
To fix this problem, dso__find_sym function uses symbol size to check whether the offset exists within the found symbol range.
@Bojun-Seo you mentioned a case where static function symbols are stripped. I may miss it somehow but what is the command line to strip a static function symbol at the same time keeping the global symbol? Why you want to strip static function only? I am not saying your change is incorrect, just want to understand the motivation better.
@yonghong-song The command to strip the target binary is "strip". Please refer this web page. https://man7.org/linux/man-pages/man1/strip.1.html Binaries are usually stripped(remove symbol of the static functions) to save storage in embedded system. Static function can be called only by the functions in the same source file, who know the relative position of the static function. Which means, the symbol of the static function is not necessary information to execute binaries.
@Bojun-Seo I am aware of strip
or llvm-strip
binary. I am asking what is the command line you use to only strip static functions.
@yonghong-song I use strip command to strip binaries to reduce the size. I didn't use and don't know any command to remove only static functions. I mentioned static function to explain the reason of this PR.
@eiffel-fl (Just so I can remember of reviewing it.)
Followings are the way to generate the problem I mentioned.
Environment
On x86_64 architecture
$ cat /etc/issue.net
Ubuntu 22.04.1 LTS
$ uname -r
5.15.0-52-generic
File to test
$ cat pr4158_test.c
#include <unistd.h>
#include <stdlib.h>
#include <stdbool.h>
#include <stdlib.h>
#include <stdio.h>
int* foo();
void bar(int* p);
void func(int* p);
void baz(int* p);
int main(int argc, char* argv[]) {
sleep(50);
int *val = foo();
*val = 33;
bar(val);
*val = 84;
baz(val);
return 0;
}
void func(int* p) {
while (true) {
if (p != NULL) {
printf("free %d\n", *p);
free(p);
break;
}
}
}
int* foo() {
return (int*)malloc(sizeof(int));
}
void bar(int* p) {
printf("bar: %p\n", p);
free(p);
}
void baz(int* p) {
printf("baz: %p\n", p);
printf("bazz: %d\n", *p);
func(p);
}
Compile and run doublefree
for test
checkout following PR and compile doublefree
https://github.com/iovisor/bcc/pull/4286
$ make doublefree
$ gcc pr4185_test.c
$ ./a.out &
[1] 18502
$ sudo ./doublefree -p 18502
<... skip some logs from a.out ...>
Found double free...
Allocation happended on:
stack_id: 15300
#1 0x00563df0bbe219 foo
#2 0x00563df0bbe1d0 main
#3 0x007f7db807bd90 __libc_init_first
First deallocation happended on:
stack_id: 18239
#1 0x007f7db80f7460 free
#2 0x00563df0bbe1ea main
#3 0x007f7db807bd90 __libc_init_first
Second deallocation happended on:
stack_id: 38034
#1 0x007f7db80f7460 free
#2 0x00563df0bbe2eb func
#3 0x00563df0bbe200 main
#4 0x007f7db807bd90 __libc_init_first
This shows that correct symbols are printed.
But if I strip the binary of the test program.
(strip foo
only for test)
$ strip -N foo a.out
And I test again,
We can find that func
is printed, instead of foo
on allocation stack backtrace!
Found double free...
Allocation happended on:
stack_id: 64266
#1 0x00555c0c90f25c func
#2 0x00555c0c90f1d0 main
#3 0x007f5a2de39d90 __libc_init_first
First deallocation happended on:
stack_id: 29131
#1 0x007f5a2deb5460 free
#2 0x00555c0c90f1ea main
#3 0x007f5a2de39d90 __libc_init_first
Second deallocation happended on:
stack_id: 53922
#1 0x007f5a2deb5460 free
#2 0x00555c0c90f2eb baz
#3 0x00555c0c90f200 main
#4 0x007f5a2de39d90 __libc_init_first
It is natural that foo
is not printed(because it is stripped).
But is should not print other name(func
doesn't call malloc
)
func
printed because it located just ahead of foo
I tested it with doublefree
tool but any other tool with stripped binary can generate this issue.
I tested this commit on top of doublefree
and here is the output I am getting:
Allocation happended on:
stack_id: 43722
#1 0x00561cdbf0525c
#2 0x00561cdbf051d0 main
#3 0x007fa494429d90
Am I doing something wrong?
@eiffel-fl I will share you what I did. Can you check the difference between mine and yours?
$ cat /etc/issue.net
Ubuntu 22.04.3 LTS
$ uname -r
6.2.0-32-generic
$ git clone https://github.com/Bojun-Seo/bcc.git -b doublefree
$ cd bcc
# HASH value of this commit: 777236819aa6429ef0fc6c5cc9b42918b5b7b4e7
$ git cherry-pick 777236819aa6429ef0fc6c5cc9b42918b5b7b4e7
$ cd bcc
$ mkdir build
$ cd build
$ cmake ..
$ cd ../libbpf-tools/
$ make doublefree
# build test program
$ cat pr4158_test.c
#include <unistd.h>
#include <stdlib.h>
#include <stdbool.h>
#include <stdlib.h>
#include <stdio.h>
int* foo();
void bar(int* p);
void func(int* p);
void baz(int* p);
int main(int argc, char* argv[]) {
sleep(50);
int *val = foo();
*val = 33;
bar(val);
*val = 84;
baz(val);
return 0;
}
void func(int* p) {
while (true) {
if (p != NULL) {
printf("free %d\n", *p);
free(p);
break;
}
}
}
int* foo() {
return (int*)malloc(sizeof(int));
}
void bar(int* p) {
printf("bar: %p\n", p);
free(p);
}
void baz(int* p) {
printf("baz: %p\n", p);
printf("bazz: %d\n", *p);
func(p);
}
$ gcc pr4158_test.c
$ strip -N foo a.out
$ ./a.out &
[1] 30188
$ sudo ./doublefree -p 30188
#1 Found double free...
Allocation happended on stack_id: 45630
#1 0x00557fd8de625c (/home/bojun/doublefree/a.out+0x125c)
#2 0x00557fd8de61d0 main+0x27 (/home/bojun/doublefree/a.out+0x11d0)
#3 0x007f3a37429d90 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x29d90)
First deallocation happended on stack_id: 28822
#1 0x007f3a374a5460 free+0 (/usr/lib/x86_64-linux-gnu/libc.so.6+0xa5460)
#2 0x00557fd8de61ea main+0x41 (/home/bojun/doublefree/a.out+0x11ea)
#3 0x007f3a37429d90 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x29d90)
Second deallocation happended on stack_id: 52130
#1 0x007f3a374a5460 free+0 (/usr/lib/x86_64-linux-gnu/libc.so.6+0xa5460)
#2 0x00557fd8de62eb baz+0x53 (/home/bojun/doublefree/a.out+0x12eb)
#3 0x00557fd8de6200 main+0x57 (/home/bojun/doublefree/a.out+0x1200)
#4 0x007f3a37429d90 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x29d90)
I tested again with the cherry-pick
and everything is fine:
#1 Found double free...
Allocation happended on stack_id: 7738
#1 0x0055afe6b0925c (/home/francis/Codes/kinvolk/bcc/libbpf-tools/a.out+0x125c)
#2 0x0055afe6b091d0 main+0x27 (/home/francis/Codes/kinvolk/bcc/libbpf-tools/a.out+0x11d0)
#3 0x007fd0a4e29d90 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x29d90)
First deallocation happended on stack_id: 43624
#1 0x007fd0a4ea5460 free+0 (/usr/lib/x86_64-linux-gnu/libc.so.6+0xa5460)
#2 0x0055afe6b091ea main+0x41 (/home/francis/Codes/kinvolk/bcc/libbpf-tools/a.out+0x11ea)
#3 0x007fd0a4e29d90 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x29d90)
Second deallocation happended on stack_id: 52007
#1 0x007fd0a4ea5460 free+0 (/usr/lib/x86_64-linux-gnu/libc.so.6+0xa5460)
#2 0x0055afe6b092eb baz+0x53 (/home/francis/Codes/kinvolk/bcc/libbpf-tools/a.out+0x12eb)
#3 0x0055afe6b09200 main+0x57 (/home/francis/Codes/kinvolk/bcc/libbpf-tools/a.out+0x1200)
#4 0x007fd0a4e29d90 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x29d90)
I should have forgotten a step when tested previously.
@yonghong-song I updated the description and commit messages, and added a way to generate bug with memleak
. I hope you to review this one line bug fix patch.
This is your last question: "I am asking what is the command line you use to only strip static functions."
And my answer is "I didn't use any command line to strip only static functions. and I don't know what command could strip only static functions."
Since libc.so.6
installed on x86_64 Ubuntu 22.04.3 LTS
is stripped. We could easily generate this issue on memleak
with following simple test program.
Build memleak
on folder bcc/libbpf-tools
$ make memleak
File test.c
#include <unistd.h>
#include <stdlib.h>
int main(int argc, char* argv[]) {
int *a;
while (1) {
sleep(5);
a = malloc(4);
}
return 0;
}
Compile and run
$ gcc test.c
$ ./a.out &
[1] 338930
$ sudo ./memleak -p 338930
using default object: libc.so.6
using page size: 4096
tracing kernel: false
Tracing outstanding memory allocs... Hit Ctrl-C to end
[12:55:36] Top 1 stacks with outstanding allocations:
4 bytes in 1 allocations from stack
0 [<0000564933f76190>] main+0x27 [/home/bojun/bcc/libbpf-tools/a.out]
1 [<00007f579c629d90>] __libc_init_first+0x90 [/usr/lib/x86_64-linux-gnu/libc.so.6]
^C[12:55:38] Top 1 stacks with outstanding allocations:
4 bytes in 1 allocations from stack
0 [<0000564933f76190>] main+0x27 [/home/bojun/bcc/libbpf-tools/a.out]
1 [<00007f579c629d90>] __libc_init_first+0x90 [/usr/lib/x86_64-linux-gnu/libc.so.6]
done
Here we can check that symbol __libc_init_first
which is incorrect. The report says that the symbol offset value is 0x90
. And the start location of symbol __libc_init_first
is 0x29d00
. If we add up this two values, the location will be 0x29d90
. Which is the address inside __libc_start_call_main
function.
$ objdump -D /usr/lib/x86_64-linux-gnu/libc.so.6
... snip ...
0000000000029d00 <__libc_init_first>:
29d00: f3 0f 1e fa endbr64
29d04: c3 ret
29d05: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
29d0c: 00 00 00
29d0f: 90 nop
0000000000029d10 <__libc_start_call_main>:
29d10: 50 push %rax
29d11: 58 pop %rax
29d12: 48 81 ec 98 00 00 00 sub $0x98,%rsp
29d19: 48 89 7c 24 08 mov %rdi,0x8(%rsp)
29d1e: 48 8d 7c 24 20 lea 0x20(%rsp),%rdi
29d23: 89 74 24 14 mov %esi,0x14(%rsp)
29d27: 48 89 54 24 18 mov %rdx,0x18(%rsp)
29d2c: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
29d33: 00 00
29d35: 48 89 84 24 88 00 00 mov %rax,0x88(%rsp)
29d3c: 00
29d3d: 31 c0 xor %eax,%eax
29d3f: e8 9c 84 01 00 call 421e0 <_setjmp>
29d44: f3 0f 1e fa endbr64
29d48: 85 c0 test %eax,%eax
29d4a: 75 4b jne 29d97 <__libc_start_call_main+0x87>
29d4c: 64 48 8b 04 25 00 03 mov %fs:0x300,%rax
29d53: 00 00
29d55: 48 89 44 24 68 mov %rax,0x68(%rsp)
29d5a: 64 48 8b 04 25 f8 02 mov %fs:0x2f8,%rax
29d61: 00 00
29d63: 48 89 44 24 70 mov %rax,0x70(%rsp)
29d68: 48 8d 44 24 20 lea 0x20(%rsp),%rax
29d6d: 64 48 89 04 25 00 03 mov %rax,%fs:0x300
29d74: 00 00
29d76: 48 8b 05 3b f2 1e 00 mov 0x1ef23b(%rip),%rax # 218fb8 <__environ@@GLIBC_2.2.5-0x8248>
29d7d: 8b 7c 24 14 mov 0x14(%rsp),%edi
29d81: 48 8b 74 24 18 mov 0x18(%rsp),%rsi
29d86: 48 8b 10 mov (%rax),%rdx
29d89: 48 8b 44 24 08 mov 0x8(%rsp),%rax
29d8e: ff d0 call *%rax
29d90: 89 c7 mov %eax,%edi
29d92: e8 59 b8 01 00 call 455f0 <exit>
29d97: e8 54 78 06 00 call 915f0 <__GI___nptl_deallocate_tsd>
29d9c: f0 ff 0d 05 f5 1e 00 lock decl 0x1ef505(%rip) # 2192a8 <__nptl_nthreads>
29da3: 0f 94 c0 sete %al
29da6: 84 c0 test %al,%al
29da8: 75 0e jne 29db8 <__libc_start_call_main+0xa8>
29daa: ba 3c 00 00 00 mov $0x3c,%edx
29daf: 90 nop
29db0: 31 ff xor %edi,%edi
29db2: 89 d0 mov %edx,%eax
29db4: 0f 05 syscall
29db6: eb f8 jmp 29db0 <__libc_start_call_main+0xa0>
29db8: 31 ff xor %edi,%edi
29dba: eb d6 jmp 29d92 <__libc_start_call_main+0x82>
29dbc: 0f 1f 40 00 nopl 0x0(%rax)
... snip ...
@eiffel-fl Good suggestion! And thanks for double check this issue. There is a way to get symbol if there is a symbol at any rate. But it is impossible to find something which is not exist.
In this case, we can find a symbol manually. /usr/lib/x86_64-linux-gnu/libc.so.6
has .gnu_debuglink
section.
$ readelf -S /usr/lib/x86_64-linux-gnu/libc.so.6
... snip ...
[64] .gnu_debuglink PROGBITS 0000000000000000 0021bc74
0000000000000034 0000000000000000 0 0 4
... snip ...
.gnu_debuglink
section contains the file path(or link) which contains debug info.
So we can find the file path with the following command.
$ dwarfdump -i /usr/lib/x86_64-linux-gnu/libc.so.6 | head -1
Filename by debuglink is /usr/lib/debug/.build-id/20/3de0ae33b53fee1578b117cb4123e85d0534f0.debug
We could use grep
to find out that this file contains __libc_start_call_main
$ grep __libc_start_call_main /usr/lib/debug/.build-id/20/3de0ae33b53fee1578b117cb4123e85d0534f0.debug
grep: /usr/lib/debug/.build-id/20/3de0ae33b53fee1578b117cb4123e85d0534f0.debug: binary file matches
I guess, the objdump
tool use this information to show the results I attached previous comment.
Hi!
@eiffel-fl Good suggestion! And thanks for double check this issue. There is a way to get symbol if there is a symbol at any rate. But it is impossible to find something which is not exist.
OK! It makes sense then! Thank you for the explanation.
Best regards.
@ethercflow I found that you added the code I'm trying to change, so I hope you to review this PR.