bpftune icon indicating copy to clipboard operation
bpftune copied to clipboard

Segmentation fault on Ubuntu 22.04.2 LTS

Open andrey-admin opened this issue 2 years ago • 32 comments
trafficstars

Hello,

Got Segmentation fault (core dumped) when trying to run bpftune on Linux 5.19.0-1026-gcp kernel with:

[949460.456403] bpftune[82605]: segfault at 0 ip 00007f146066fde2 sp 00007fff34453140 error 4 in tcp_cong_tuner.so[7f146066f000+2000] [949460.456415] Code: 45 a8 0f b6 40 40 83 f0 01 84 c0 0f 84 97 00 00 00 e8 c3 f7 ff ff 48 89 45 c8 48 8b 45 a8 48 8b 55 c8 48 89 50 48 48 8b 45 c8 <48> 8b 10 48 8b 45 a8 48 89 50 38 e8 8e f4 ff ff 48 8b 55 c8 48 8b

in dmesg.

How i can fix that?

Thanks.

andrey-admin avatar Jul 10 '23 11:07 andrey-admin

thanks for reportig - can you retry with the latest main branch? I ran into a segmentation fault on ubuntu and pushed a fix that resolved it. if it's still there, can you attach the stack associated with the core dump from gdb and i'll try and figure out what's going on.

alan-maguire avatar Jul 10 '23 12:07 alan-maguire

@alan-maguire same issue but with that in dmesg:

[953685.243427] bpftune[85946]: segfault at 0 ip 00007f29a426fde2 sp 00007ffc218d6980 error 4 in tcp_cong_tuner.so[7f29a426f000+2000] [953685.243437] Code: 45 a8 0f b6 40 40 83 f0 01 84 c0 0f 84 97 00 00 00 e8 c3 f7 ff ff 48 89 45 c8 48 8b 45 a8 48 8b 55 c8 48 89 50 48 48 8b 45 c8 <48> 8b 10 48 8b 45 a8 48 89 50 38 e8 8e f4 ff ff 48 8b 55 c8 48 8b

andrey-admin avatar Jul 10 '23 12:07 andrey-admin

dump.zip

andrey-admin avatar Jul 10 '23 13:07 andrey-admin

thanks! i can't reproduce it so can you try running "gdb bpftune ", and once in gdb run "bt" to get a stack backtrace? you might need to run "sudo sysctl -w kernel.core_pattern=core.%f.%p" first to get core files of form core.bpftune..

alan-maguire avatar Jul 10 '23 13:07 alan-maguire

https://pastebin.com/76XQ30ac

andrey-admin avatar Jul 10 '23 13:07 andrey-admin

thanks; the crash is happening in bpftune_bpf_init(); would you be able to run "bpftune -ds" to see if we can see what is happening with bpf open/load/attach?

alan-maguire avatar Jul 10 '23 13:07 alan-maguire

https://pastebin.com/PbFnpyFi

andrey-admin avatar Jul 10 '23 13:07 andrey-admin

i suspect the issue is https://lore.kernel.org/bpf/[email protected]/ where the bpf skeleton generation does not like the .rodata.cst16 section . It may be that a newer bpftool might help; i'm using bpftool 5.15 on ubuntu from the linux-tools package synced to the kernel version. however we may also be able to work around this; you could try making the following chages to tcp_cong_tuner.bpf.c and rebuilding:

diff --git a/src/tcp_cong_tuner.bpf.c b/src/tcp_cong_tuner.bpf.c index 77957b3..ab6661a 100644 --- a/src/tcp_cong_tuner.bpf.c +++ b/src/tcp_cong_tuner.bpf.c @@ -40,7 +40,7 @@ static __always_inline bool retransmit_threshold(struct remote_host *remote_host, u32 segs_out, u32 total_retrans) {

  •   const char bbr[CONG_MAXNAME] = "bbr";
    
  •   static const char bbr[4] = "bbr";
      __u64 now;
    
      if (!remote_host)
    

@@ -188,7 +188,7 @@ int BPF_PROG(cong_retransmit, struct sock *sk, struct sk_buff *skb) struct tcp_sock *tp = (struct tcp_sock *)sk; struct in6_addr *key = &sin6->sin6_addr; __u32 segs_out = 0, total_retrans = 0;

  •   const char bbr[CONG_MAXNAME] = "bbr";
    
  •   static const char bbr[4] = "bbr";
      int id = TCP_CONG_BBR;
      struct net *net;
    

that was enough to get rid of the .rodata.cst16 section (it's replaced with .rodata.str1.1 that bpftool can handle).

alan-maguire avatar Jul 10 '23 14:07 alan-maguire

patch got mangled but replaces

const char bbr[CONG_MAXNAME] = "bbr";

...with

static const char bbr[4] = "bbr";

...in the two places it is declared in tcp_cong_tuner.bpf.c

alan-maguire avatar Jul 10 '23 14:07 alan-maguire

static volatile const char const bbr[4] = {'b', 'b', 'r', '\0'}; B-)

pavlinux avatar Jul 10 '23 14:07 pavlinux

can you put patch in attach, please?

andrey-admin avatar Jul 10 '23 14:07 andrey-admin

still sigfault.

#0 0x00007fa2a93a2de2 in init (tuner=0x558225754260) at tcp_cong_tuner.c:58 58 bpftuner_bpf_init(tcp_cong, tuner, NULL); (gdb) bt #0 0x00007fa2a93a2de2 in init (tuner=0x558225754260) at tcp_cong_tuner.c:58 #1 0x00007fa2a99efa88 in bpftuner_init (path=0x7fff10a1bfb0 "/usr/lib64/bpftune//tcp_cong_tuner.so") at libbpftune.c:655 #2 0x0000558225735e32 in init (library_dir=0x5582257374aa "/usr/lib64/bpftune/") at bpftune.c:199 #3 0x0000558225736541 in main (argc=2, argv=0x7fff10a1c478) at bpftune.c:391

andrey-admin avatar Jul 10 '23 14:07 andrey-admin

last strings from -ds:

bpftune: libbpf: prog 'cong_retransmit': found data map 5 (tcp_cong.bss, sec 8, off 0) for insn 157 bpftune: libbpf: sec '.reltp_btf/tcp_retransmit_skb': relo #4: insn #179 against 'init_net' bpftune: libbpf: prog 'cong_retransmit': found extern #0 'init_net' (sym 34) for insn #179 bpftune: libbpf: sec '.reltp_btf/tcp_retransmit_skb': relo #5: insn #182 against 'bpftune_init_net' bpftune: libbpf: prog 'cong_retransmit': found data map 5 (tcp_cong.bss, sec 8, off 0) for insn 182 bpftune: libbpf: sec '.reltp_btf/tcp_retransmit_skb': relo #6: insn #190 against 'ring_buffer_map' bpftune: libbpf: prog 'cong_retransmit': found map 0 (ring_buffer_map, sec 9, off 0) for insn #190 bpftune: libbpf: sec '.reliter/tcp': collecting relocation for section(5) 'iter/tcp' bpftune: libbpf: sec '.reliter/tcp': relo #0: insn #37 against 'remote_host_map' bpftune: libbpf: prog 'bpftune_cong_iter': found map 3 (remote_host_map, sec 9, off 80) for insn #37 bpftune: libbpf: sec '.reliter/tcp': relo #1: insn #52 against 'remote_host_map' bpftune: libbpf: prog 'bpftune_cong_iter': found map 3 (remote_host_map, sec 9, off 80) for insn #52 bpftune: libbpf: sec '.reliter/tcp': relo #2: insn #57 against 'remote_host_map' bpftune: libbpf: prog 'bpftune_cong_iter': found map 3 (remote_host_map, sec 9, off 80) for insn #57 bpftune: libbpf: sec '.reliter/tcp': relo #3: insn #135 against 'debug' bpftune: libbpf: prog 'bpftune_cong_iter': found data map 5 (tcp_cong.bss, sec 8, off 0) for insn 135 bpftune: libbpf: sec '.reliter/tcp': relo #4: insn #139 against '.rodata' bpftune: libbpf: prog 'bpftune_cong_iter': found data map 4 (tcp_cong.rodata, sec 11, off 0) for insn 139 bpftune: libbpf: failed to find skeleton map '.rodata.str1.1' Segmentation fault (core dumped)

andrey-admin avatar Jul 10 '23 14:07 andrey-admin

can you check bpftool, clang versions ("bpftool --version", "clang --version"? ubuntu with bpftool v5.15 and clang v14 work fine for me, even with the .rodata.str1.1 sections.

alan-maguire avatar Jul 10 '23 15:07 alan-maguire

root@nginx-01:/usr/src/bpf/bpftune# bpftool --version /usr/lib/linux-tools/5.19.0-1026-gcp/bpftool v7.0.0 using libbpf v1.0 features: libbpf_strict root@nginx-01:/usr/src/bpf/bpftune# clang --version Ubuntu clang version 14.0.0-1ubuntu1 Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/bin

andrey-admin avatar Jul 11 '23 08:07 andrey-admin

Machine - google cloud virtual server (n2d-highcpu-16)

andrey-admin avatar Jul 11 '23 08:07 andrey-admin

thanks; above look fine and similar to my setup so I'm puzzled why we're seeing different things. regardless i think i've fixed one of the issues here; when bpftune opens/loads/attaches bpf it uses macros and these need to return failure status otherwise we try to load a program that failed to open, or attach a program that failed to load. i've merged that in pr https://github.com/oracle-samples/bpftune/pull/26 so hopefully that should resolve the segmentation fault, but i don't yet have a good solution for the bpf loading failure.

alan-maguire avatar Jul 11 '23 08:07 alan-maguire

Just pulled repo, rebuild bpftune - no changes. Same Segmentation fault. Last strings from -ds:

bpftune: libbpf: prog 'cong_retransmit': found map 0 (ring_buffer_map, sec 9, off 0) for insn #190 bpftune: libbpf: sec '.reliter/tcp': collecting relocation for section(5) 'iter/tcp' bpftune: libbpf: sec '.reliter/tcp': relo #0: insn #37 against 'remote_host_map' bpftune: libbpf: prog 'bpftune_cong_iter': found map 3 (remote_host_map, sec 9, off 80) for insn #37 bpftune: libbpf: sec '.reliter/tcp': relo #1: insn #52 against 'remote_host_map' bpftune: libbpf: prog 'bpftune_cong_iter': found map 3 (remote_host_map, sec 9, off 80) for insn #52 bpftune: libbpf: sec '.reliter/tcp': relo #2: insn #57 against 'remote_host_map' bpftune: libbpf: prog 'bpftune_cong_iter': found map 3 (remote_host_map, sec 9, off 80) for insn #57 bpftune: libbpf: sec '.reliter/tcp': relo #3: insn #135 against 'debug' bpftune: libbpf: prog 'bpftune_cong_iter': found data map 5 (tcp_cong.bss, sec 8, off 0) for insn 135 bpftune: libbpf: sec '.reliter/tcp': relo #4: insn #139 against '.rodata' bpftune: libbpf: prog 'bpftune_cong_iter': found data map 4 (tcp_cong.rodata, sec 11, off 0) for insn 139 bpftune: libbpf: failed to find skeleton map '.rodata.str1.1'

andrey-admin avatar Jul 11 '23 08:07 andrey-admin

gdb:

Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f8fd9adadec in init (tuner=0x55ec5431e260) at tcp_cong_tuner.c:58 58 err = bpftuner_bpf_init(tcp_cong, tuner, NULL); (gdb) bt #0 0x00007f8fd9adadec in init (tuner=0x55ec5431e260) at tcp_cong_tuner.c:58 #1 0x00007f8fda166a88 in bpftuner_init (path=0x7ffc34acd3f0 "/usr/lib64/bpftune//tcp_cong_tuner.so") at libbpftune.c:655 #2 0x000055ec542ffe32 in init (library_dir=0x55ec543014aa "/usr/lib64/bpftune/") at bpftune.c:199 #3 0x000055ec54300541 in main (argc=2, argv=0x7ffc34acd8b8) at bpftune.c:391

andrey-admin avatar Jul 11 '23 08:07 andrey-admin

# gdb --args `which bpftune` -s;
(gdb) break  tcp_cong_tuner.c:58
(gdb) run

and next step, fin, step, ... commands;

pavlinux avatar Jul 11 '23 09:07 pavlinux

root@nginx-01:/usr/src/bpf/bpftune# gdb --args which bpftune -s; GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1 Copyright (C) 2022 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: https://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/.

For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/sbin/bpftune... (gdb) run Starting program: /usr/sbin/bpftune -s [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". bpftune: bpftune works fully bpftune: bpftune supports per-netns policy (via netns cookie)

Program received signal SIGSEGV, Segmentation fault. 0x00007ffff784edec in init (tuner=0x555555575260) at tcp_cong_tuner.c:58 58 err = bpftuner_bpf_init(tcp_cong, tuner, NULL);

andrey-admin avatar Jul 11 '23 09:07 andrey-admin

https://github.com/oracle-samples/bpftune/pull/27 may help here i think; if you get a chance, would you mind rebuilding/retesting. thanks!

alan-maguire avatar Jul 11 '23 09:07 alan-maguire

Yeah, bpftune started ok, without any fault. Checking how it working.

Thanks!

andrey-admin avatar Jul 11 '23 09:07 andrey-admin

great, thanks for taking the time to work through this! i'm hoping to still get to the bottom of why the str sections cause issues at your end too; that will result in the associated congestion tuner not loading.

alan-maguire avatar Jul 11 '23 11:07 alan-maguire

Sorry, miss to check syslog after start.

Jul 11 09:41:09 nginx-01 bpftune[12736]: bpftune works fully Jul 11 09:41:09 nginx-01 bpftune[12736]: bpftune supports per-netns policy (via netns cookie) Jul 11 09:41:09 nginx-01 bpftune[12736]: tcp_cong open bpf: No such process Jul 11 09:41:09 nginx-01 bpftune[12736]: error initializing '/usr/lib64/bpftune//tcp_cong_tuner.so: No such process Jul 11 09:41:09 nginx-01 bpftune[12736]: could not open /proc/sys/net/ipv6/neigh/default/gc_interval (netns fd 0) for reading: No such file or directory Jul 11 09:41:09 nginx-01 bpftune[12736]: error reading tunable 'net.ipv6.neigh.default.gc_interval': No such file or directory Jul 11 09:41:09 nginx-01 bpftune[12736]: error initializing '/usr/lib64/bpftune//neigh_table_tuner.so: No such file or directory Jul 11 09:41:09 nginx-01 bpftune[12736]: could not open /proc/sys/net/ipv6/route/max_size (netns fd 0) for reading: No such file or directory Jul 11 09:41:09 nginx-01 bpftune[12736]: error reading tunable 'net.ipv6.route.max_size': No such file or directory Jul 11 09:41:09 nginx-01 bpftune[12736]: error initializing '/usr/lib64/bpftune//route_table_tuner.so: No such file or directory

But all files on place: root@nginx-01:/usr/src/bpf/bpftune# ls -ld /usr/lib64/bpftune//tcp_cong_tuner.so /usr/lib64/bpftune//neigh_table_tuner.so /usr/lib64/bpftune//route_table_tuner.so -rwxr-xr-x 1 root root 1626360 Jul 11 09:40 /usr/lib64/bpftune//neigh_table_tuner.so -rwxr-xr-x 1 root root 1622040 Jul 11 09:40 /usr/lib64/bpftune//route_table_tuner.so -rwxr-xr-x 1 root root 896456 Jul 11 09:40 /usr/lib64/bpftune//tcp_cong_tuner.so

andrey-admin avatar Jul 11 '23 11:07 andrey-admin

the "no such file or directory" comes from an ENOENT error; in the case of the neigh_table_tuner, what's missing are the ipv6 tunables . in the case of the tcp congestion tuner, the tuner is not there due to the issues with the string section; it's just that we don't fall over now and segfault. if ipv6 is disabled that probably explains the neigh table tuner issues.

alan-maguire avatar Jul 11 '23 13:07 alan-maguire

So, all must working proper? How i can check status or some stats while bpftune started as deamon?

Can you fix that errors for disabled ipv6 configurations, please?

And string "tcp_cong open bpf: No such process" - is all ok too?

Thanks!

andrey-admin avatar Jul 11 '23 13:07 andrey-admin

i'm working on adding support for handling ipv6 disabled by making some tunables optional; should have a fix for this in the next few days. the tcp_cong_tuner issue is that bpf won't load due to the .rodata.str.1 section being a problem on your system. i haven't been able to reproduce that but will try and fix it once i can.

alan-maguire avatar Jul 11 '23 13:07 alan-maguire

If need any data from my system - just say how to collect, i will.

Thanks.

andrey-admin avatar Jul 11 '23 13:07 andrey-admin