bcc icon indicating copy to clipboard operation
bcc copied to clipboard

libbpf-tools: Fix trace_helper symbol search bug

Open Bojun-Seo opened this issue 2 years ago • 5 comments

dso__find_sym function considers elf file is not stripped. So it could find wrong symbol if the target elf file is stripped. Consider if the start address of each function 'foo', 'bar', 'baz' is '0x1000', '0x4000', '0x9000' respectively. And 'bar' stripped since it's a static function. If user wants to find the symbol of address '0x4080', user expected result will be 'bar', and dso__find_sym function expected result will be NULL, but actual result is 'foo'

To fix this problem, dso__find_sym function uses symbol size to check whether the offset exists within the found symbol range.

Bojun-Seo avatar Aug 11 '22 03:08 Bojun-Seo

@Bojun-Seo you mentioned a case where static function symbols are stripped. I may miss it somehow but what is the command line to strip a static function symbol at the same time keeping the global symbol? Why you want to strip static function only? I am not saying your change is incorrect, just want to understand the motivation better.

yonghong-song avatar Aug 14 '22 04:08 yonghong-song

@yonghong-song The command to strip the target binary is "strip". Please refer this web page. https://man7.org/linux/man-pages/man1/strip.1.html Binaries are usually stripped(remove symbol of the static functions) to save storage in embedded system. Static function can be called only by the functions in the same source file, who know the relative position of the static function. Which means, the symbol of the static function is not necessary information to execute binaries.

Bojun-Seo avatar Aug 16 '22 01:08 Bojun-Seo

@Bojun-Seo I am aware of strip or llvm-strip binary. I am asking what is the command line you use to only strip static functions.

yonghong-song avatar Aug 28 '22 19:08 yonghong-song

@yonghong-song I use strip command to strip binaries to reduce the size. I didn't use and don't know any command to remove only static functions. I mentioned static function to explain the reason of this PR.

Bojun-Seo avatar Aug 29 '22 00:08 Bojun-Seo

@eiffel-fl (Just so I can remember of reviewing it.)

eiffel-fl avatar Sep 29 '22 16:09 eiffel-fl

Followings are the way to generate the problem I mentioned.

Environment

On x86_64 architecture
$ cat /etc/issue.net
Ubuntu 22.04.1 LTS

$ uname -r
5.15.0-52-generic

File to test

$ cat pr4158_test.c
#include <unistd.h>
#include <stdlib.h>
#include <stdbool.h>
#include <stdlib.h>
#include <stdio.h>

int* foo();
void bar(int* p);
void func(int* p);
void baz(int* p);

int main(int argc, char* argv[]) {
  sleep(50);
  int *val = foo();
  *val = 33;
  bar(val);
  *val = 84;
  baz(val);
  return 0;
}

void func(int* p) {
  while (true) {
    if (p != NULL) {
      printf("free %d\n", *p);
      free(p);
      break;
    }
  }
}

int* foo() {
  return (int*)malloc(sizeof(int));
}

void bar(int* p) {
  printf("bar: %p\n", p);
  free(p);
}

void baz(int* p) {
  printf("baz: %p\n", p);
  printf("bazz: %d\n", *p);
  func(p);
}

Compile and run doublefree for test checkout following PR and compile doublefree https://github.com/iovisor/bcc/pull/4286

$ make doublefree
$ gcc pr4185_test.c
$ ./a.out &
[1] 18502
$ sudo ./doublefree -p 18502
<... skip some logs from a.out ...>
Found double free...
Allocation happended on:
stack_id: 15300
        #1 0x00563df0bbe219 foo
        #2 0x00563df0bbe1d0 main
        #3 0x007f7db807bd90 __libc_init_first

First deallocation happended on:
stack_id: 18239
        #1 0x007f7db80f7460 free
        #2 0x00563df0bbe1ea main
        #3 0x007f7db807bd90 __libc_init_first

Second deallocation happended on:
stack_id: 38034
        #1 0x007f7db80f7460 free
        #2 0x00563df0bbe2eb func
        #3 0x00563df0bbe200 main
        #4 0x007f7db807bd90 __libc_init_first

This shows that correct symbols are printed. But if I strip the binary of the test program. (strip foo only for test)

$ strip -N foo a.out

And I test again, We can find that func is printed, instead of foo on allocation stack backtrace!

Found double free...
Allocation happended on:
stack_id: 64266
        #1 0x00555c0c90f25c func
        #2 0x00555c0c90f1d0 main
        #3 0x007f5a2de39d90 __libc_init_first


First deallocation happended on:
stack_id: 29131
        #1 0x007f5a2deb5460 free
        #2 0x00555c0c90f1ea main
        #3 0x007f5a2de39d90 __libc_init_first


Second deallocation happended on:
stack_id: 53922
        #1 0x007f5a2deb5460 free
        #2 0x00555c0c90f2eb baz
        #3 0x00555c0c90f200 main
        #4 0x007f5a2de39d90 __libc_init_first

It is natural that foo is not printed(because it is stripped). But is should not print other name(func doesn't call malloc) func printed because it located just ahead of foo

I tested it with doublefree tool but any other tool with stripped binary can generate this issue.

Bojun-Seo avatar Oct 24 '22 07:10 Bojun-Seo

I tested this commit on top of doublefree and here is the output I am getting:

Allocation happended on:
stack_id: 43722
        #1 0x00561cdbf0525c
        #2 0x00561cdbf051d0 main
        #3 0x007fa494429d90

Am I doing something wrong?

eiffel-fl avatar Sep 01 '23 14:09 eiffel-fl

@eiffel-fl I will share you what I did. Can you check the difference between mine and yours?

$ cat /etc/issue.net
Ubuntu 22.04.3 LTS
$ uname -r
6.2.0-32-generic
$ git clone https://github.com/Bojun-Seo/bcc.git -b doublefree
$ cd bcc
# HASH value of this commit: 777236819aa6429ef0fc6c5cc9b42918b5b7b4e7
$ git cherry-pick 777236819aa6429ef0fc6c5cc9b42918b5b7b4e7
$ cd bcc
$ mkdir build
$ cd build
$ cmake ..
$ cd ../libbpf-tools/
$ make doublefree
# build test program
$ cat pr4158_test.c
#include <unistd.h>
#include <stdlib.h>
#include <stdbool.h>
#include <stdlib.h>
#include <stdio.h>

int* foo();
void bar(int* p);
void func(int* p);
void baz(int* p);

int main(int argc, char* argv[]) {
  sleep(50);
  int *val = foo();
  *val = 33;
  bar(val);
  *val = 84;
  baz(val);
  return 0;
}

void func(int* p) {
  while (true) {
    if (p != NULL) {
      printf("free %d\n", *p);
      free(p);
      break;
    }
  }
}

int* foo() {
  return (int*)malloc(sizeof(int));
}

void bar(int* p) {
  printf("bar: %p\n", p);
  free(p);
}

void baz(int* p) {
  printf("baz: %p\n", p);
  printf("bazz: %d\n", *p);
  func(p);
}
$ gcc pr4158_test.c
$ strip -N foo a.out
$ ./a.out &
[1] 30188
$ sudo ./doublefree -p 30188
#1 Found double free...
Allocation happended on stack_id: 45630
        #1 0x00557fd8de625c (/home/bojun/doublefree/a.out+0x125c)
        #2 0x00557fd8de61d0 main+0x27 (/home/bojun/doublefree/a.out+0x11d0)
        #3 0x007f3a37429d90 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x29d90)


First deallocation happended on stack_id: 28822
        #1 0x007f3a374a5460 free+0 (/usr/lib/x86_64-linux-gnu/libc.so.6+0xa5460)
        #2 0x00557fd8de61ea main+0x41 (/home/bojun/doublefree/a.out+0x11ea)
        #3 0x007f3a37429d90 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x29d90)


Second deallocation happended on stack_id: 52130
        #1 0x007f3a374a5460 free+0 (/usr/lib/x86_64-linux-gnu/libc.so.6+0xa5460)
        #2 0x00557fd8de62eb baz+0x53 (/home/bojun/doublefree/a.out+0x12eb)
        #3 0x00557fd8de6200 main+0x57 (/home/bojun/doublefree/a.out+0x1200)
        #4 0x007f3a37429d90 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x29d90)

Bojun-Seo avatar Sep 11 '23 01:09 Bojun-Seo

I tested again with the cherry-pick and everything is fine:

#1 Found double free...
Allocation happended on stack_id: 7738
        #1 0x0055afe6b0925c (/home/francis/Codes/kinvolk/bcc/libbpf-tools/a.out+0x125c)
        #2 0x0055afe6b091d0 main+0x27 (/home/francis/Codes/kinvolk/bcc/libbpf-tools/a.out+0x11d0)
        #3 0x007fd0a4e29d90 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x29d90)


First deallocation happended on stack_id: 43624
        #1 0x007fd0a4ea5460 free+0 (/usr/lib/x86_64-linux-gnu/libc.so.6+0xa5460)
        #2 0x0055afe6b091ea main+0x41 (/home/francis/Codes/kinvolk/bcc/libbpf-tools/a.out+0x11ea)
        #3 0x007fd0a4e29d90 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x29d90)


Second deallocation happended on stack_id: 52007
        #1 0x007fd0a4ea5460 free+0 (/usr/lib/x86_64-linux-gnu/libc.so.6+0xa5460)
        #2 0x0055afe6b092eb baz+0x53 (/home/francis/Codes/kinvolk/bcc/libbpf-tools/a.out+0x12eb)
        #3 0x0055afe6b09200 main+0x57 (/home/francis/Codes/kinvolk/bcc/libbpf-tools/a.out+0x1200)
        #4 0x007fd0a4e29d90 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x29d90)

I should have forgotten a step when tested previously.

eiffel-fl avatar Sep 18 '23 14:09 eiffel-fl

@yonghong-song I updated the description and commit messages, and added a way to generate bug with memleak. I hope you to review this one line bug fix patch. This is your last question: "I am asking what is the command line you use to only strip static functions." And my answer is "I didn't use any command line to strip only static functions. and I don't know what command could strip only static functions."

Bojun-Seo avatar Dec 11 '23 08:12 Bojun-Seo

Since libc.so.6 installed on x86_64 Ubuntu 22.04.3 LTS is stripped. We could easily generate this issue on memleak with following simple test program.

Build memleak on folder bcc/libbpf-tools

$ make memleak

File test.c

#include <unistd.h>
#include <stdlib.h>

int main(int argc, char* argv[]) {
  int *a;
  while (1) {
    sleep(5);
    a = malloc(4);
  }
  return 0;
}

Compile and run

$ gcc test.c
$ ./a.out &
[1] 338930
$ sudo ./memleak -p 338930
using default object: libc.so.6
using page size: 4096
tracing kernel: false
Tracing outstanding memory allocs...  Hit Ctrl-C to end
[12:55:36] Top 1 stacks with outstanding allocations:
4 bytes in 1 allocations from stack
        0 [<0000564933f76190>] main+0x27 [/home/bojun/bcc/libbpf-tools/a.out]
        1 [<00007f579c629d90>] __libc_init_first+0x90 [/usr/lib/x86_64-linux-gnu/libc.so.6]
^C[12:55:38] Top 1 stacks with outstanding allocations:
4 bytes in 1 allocations from stack
        0 [<0000564933f76190>] main+0x27 [/home/bojun/bcc/libbpf-tools/a.out]
        1 [<00007f579c629d90>] __libc_init_first+0x90 [/usr/lib/x86_64-linux-gnu/libc.so.6]
done

Here we can check that symbol __libc_init_first which is incorrect. The report says that the symbol offset value is 0x90. And the start location of symbol __libc_init_first is 0x29d00. If we add up this two values, the location will be 0x29d90. Which is the address inside __libc_start_call_main function.

$ objdump -D /usr/lib/x86_64-linux-gnu/libc.so.6
... snip ...
0000000000029d00 <__libc_init_first>:
   29d00:       f3 0f 1e fa             endbr64
   29d04:       c3                      ret
   29d05:       66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
   29d0c:       00 00 00
   29d0f:       90                      nop

0000000000029d10 <__libc_start_call_main>:
   29d10:       50                      push   %rax
   29d11:       58                      pop    %rax
   29d12:       48 81 ec 98 00 00 00    sub    $0x98,%rsp
   29d19:       48 89 7c 24 08          mov    %rdi,0x8(%rsp)
   29d1e:       48 8d 7c 24 20          lea    0x20(%rsp),%rdi
   29d23:       89 74 24 14             mov    %esi,0x14(%rsp)
   29d27:       48 89 54 24 18          mov    %rdx,0x18(%rsp)
   29d2c:       64 48 8b 04 25 28 00    mov    %fs:0x28,%rax
   29d33:       00 00
   29d35:       48 89 84 24 88 00 00    mov    %rax,0x88(%rsp)
   29d3c:       00
   29d3d:       31 c0                   xor    %eax,%eax
   29d3f:       e8 9c 84 01 00          call   421e0 <_setjmp>
   29d44:       f3 0f 1e fa             endbr64
   29d48:       85 c0                   test   %eax,%eax
   29d4a:       75 4b                   jne    29d97 <__libc_start_call_main+0x87>
   29d4c:       64 48 8b 04 25 00 03    mov    %fs:0x300,%rax
   29d53:       00 00
   29d55:       48 89 44 24 68          mov    %rax,0x68(%rsp)
   29d5a:       64 48 8b 04 25 f8 02    mov    %fs:0x2f8,%rax
   29d61:       00 00
   29d63:       48 89 44 24 70          mov    %rax,0x70(%rsp)
   29d68:       48 8d 44 24 20          lea    0x20(%rsp),%rax
   29d6d:       64 48 89 04 25 00 03    mov    %rax,%fs:0x300
   29d74:       00 00
   29d76:       48 8b 05 3b f2 1e 00    mov    0x1ef23b(%rip),%rax        # 218fb8 <__environ@@GLIBC_2.2.5-0x8248>
   29d7d:       8b 7c 24 14             mov    0x14(%rsp),%edi
   29d81:       48 8b 74 24 18          mov    0x18(%rsp),%rsi
   29d86:       48 8b 10                mov    (%rax),%rdx
   29d89:       48 8b 44 24 08          mov    0x8(%rsp),%rax
   29d8e:       ff d0                   call   *%rax
   29d90:       89 c7                   mov    %eax,%edi
   29d92:       e8 59 b8 01 00          call   455f0 <exit>
   29d97:       e8 54 78 06 00          call   915f0 <__GI___nptl_deallocate_tsd>
   29d9c:       f0 ff 0d 05 f5 1e 00    lock decl 0x1ef505(%rip)        # 2192a8 <__nptl_nthreads>
   29da3:       0f 94 c0                sete   %al
   29da6:       84 c0                   test   %al,%al
   29da8:       75 0e                   jne    29db8 <__libc_start_call_main+0xa8>
   29daa:       ba 3c 00 00 00          mov    $0x3c,%edx
   29daf:       90                      nop
   29db0:       31 ff                   xor    %edi,%edi
   29db2:       89 d0                   mov    %edx,%eax
   29db4:       0f 05                   syscall
   29db6:       eb f8                   jmp    29db0 <__libc_start_call_main+0xa0>
   29db8:       31 ff                   xor    %edi,%edi
   29dba:       eb d6                   jmp    29d92 <__libc_start_call_main+0x82>
   29dbc:       0f 1f 40 00             nopl   0x0(%rax)
... snip ...

Bojun-Seo avatar Dec 12 '23 04:12 Bojun-Seo

@eiffel-fl Good suggestion! And thanks for double check this issue. There is a way to get symbol if there is a symbol at any rate. But it is impossible to find something which is not exist.

In this case, we can find a symbol manually. /usr/lib/x86_64-linux-gnu/libc.so.6 has .gnu_debuglink section.

$ readelf -S /usr/lib/x86_64-linux-gnu/libc.so.6
... snip ...
  [64] .gnu_debuglink    PROGBITS         0000000000000000  0021bc74
       0000000000000034  0000000000000000           0     0     4
... snip ...

.gnu_debuglink section contains the file path(or link) which contains debug info. So we can find the file path with the following command.

$ dwarfdump -i /usr/lib/x86_64-linux-gnu/libc.so.6 | head -1
Filename by debuglink is /usr/lib/debug/.build-id/20/3de0ae33b53fee1578b117cb4123e85d0534f0.debug

We could use grep to find out that this file contains __libc_start_call_main

$ grep __libc_start_call_main /usr/lib/debug/.build-id/20/3de0ae33b53fee1578b117cb4123e85d0534f0.debug
grep: /usr/lib/debug/.build-id/20/3de0ae33b53fee1578b117cb4123e85d0534f0.debug: binary file matches

I guess, the objdump tool use this information to show the results I attached previous comment.

Bojun-Seo avatar Dec 13 '23 00:12 Bojun-Seo

Hi!

@eiffel-fl Good suggestion! And thanks for double check this issue. There is a way to get symbol if there is a symbol at any rate. But it is impossible to find something which is not exist.

OK! It makes sense then! Thank you for the explanation.

Best regards.

eiffel-fl avatar Dec 13 '23 09:12 eiffel-fl

@ethercflow I found that you added the code I'm trying to change, so I hope you to review this PR.

Bojun-Seo avatar Dec 15 '23 06:12 Bojun-Seo