Tracing programs cannot be attached to non-unique kernel symbols
Describe the bug
While hacking on https://github.com/cilium/ebpf/pull/890, I decided to try and attach a fentry prog to all of my machine's symbols. Many of them fail with: attach Tracing/TraceFEntry: find target for fentry update_persistent_clock64 in vmlinux: type update_persistent_clock64: multiple candidates for *btf.Func
[1096] STRUCT 'timespec64' size=16 vlen=2
'tv_sec' type_id=1095 bits_offset=0
'tv_nsec' type_id=90 bits_offset=64
..
[28302] FUNC_PROTO '(anon)' ret_type_id=69 vlen=1
'now64' type_id=1096
[28303] FUNC 'update_persistent_clock64' type_id=28302 linkage=static
..
[62944] FUNC_PROTO '(anon)' ret_type_id=69 vlen=1
'now' type_id=1096
[62945] FUNC 'update_persistent_clock64' type_id=62944 linkage=static
update_persistent_clock64 is a weak vmlinux symbol that has an arch-specific implementation:
https://elixir.bootlin.com/linux/v6.1.1/source/kernel/time/ntp.c#L568
https://elixir.bootlin.com/linux/v6.1.1/source/arch/x86/kernel/rtc.c#L103
The argument in the weak symbol is named now64, the arch-specific one is called now.
Expected behavior
The library should either go with the first candidate, or additionally verify binary compatibility between all candidates' function signatures.
Or, we might be stuck between a rock and a hard place if the kernel expects the function's specific BTF id to be given. Perhaps we may need to try loading a prog pointing at each candidate until one is accepted.
Looks similar to https://github.com/cilium/ebpf/issues/723 and https://github.com/cilium/ebpf/issues/466.
Can both 28303 and 62945 be attached to? Why is linkage=static on both funcs? Shouldn't one be weak (maybe a clang / pahole version issue)?
I assume one should be weak, but doesn't look like we can rely on this being set correctly. Even if it's fixed on master, older kernel/pahole versions will have it wrong.
As it turns out, trying to attach Tracing programs to kernel symbols with overlapping names is currently subtly broken in Linux up until at least 6.1. Here's what I found:
λ ~ sudo bpftool btf dump id 1 | grep "'type_show'"
[24381] FUNC 'type_show' type_id=3718 linkage=static
[28217] FUNC 'type_show' type_id=3798 linkage=static
[64201] FUNC 'type_show' type_id=64196 linkage=static
[73167] FUNC 'type_show' type_id=10388 linkage=static
[76730] FUNC 'type_show' type_id=76710 linkage=static
[108946] FUNC 'type_show' type_id=108945 linkage=static
λ ~ grep -E "\stype_show$" /proc/kallsyms
0000000000000000 t type_show
0000000000000000 t type_show
... 18 symbols
In this kernel:
- 6 BTF Funcs exist with name
type_show, and all have various signatures. They are not (necessarily) equivalent in terms of function signature, implementation or semantics, so they can do totally different things and all have different signatures. - /proc/kallsyms shows 18 symbols with name
type_show
When loading a program from section fentry/type_show:
- ebpf-go will reject the Func name -> btf_id lookup with a 'multiple candidates' error caused by
TypeByName() - libbpf will pick the first-found Func, so 24381 in this case
When the kernel receives the program, it:
- Looks up the Func in vmlinux BTF for the given btf_id.
- Verifies the program taking into account the FuncProto corresponding to the given btf_id.
Finally, when attaching the program, the kernel takes the Func.Name of the given btf_id and looks up any kernel symbol using kallsyms_lookup_name. Note that this returned symbol address doesn't necessarily correspond to the given btf_id. BTF goes through dedup, and btf IDs are allocated by the compiler in order of declaration.
This means:
- the program is unlikely to be attached to the one intended by the user, especially the more candidates there are
- the program is verified against a signature that often doesn't match the target program, allowing unsafe memory access
As such, we'll keep rejecting program loads for ambiguous attach targets. PR coming to make the error a bit more helpful.