bpftrace
bpftrace copied to clipboard
Fix error in dereferencing kernel double pointers
BPF verifier can detect safety of pointer accesses for BTF-based probes (k(ret)func, iter) and therefore it is not necessary to use bpf_probe_read_kernel
inside such probes. This feature was enabled in bpftrace by commit c2c3ab96 ("Support identifying btf type").
Unfortunately, the verifier is not able to track BTF information for dereferences and array accesses on double pointers so, e.g. the following script fails to load:
# bpftrace -e 'kfunc:__module_get { print(args.module->trace_events[0]->flags);' } -v
INFO: node count: 13
Attaching 1 probe...
Error log:
reg type unsupported for arg#0 function kfunc_vmlinux___module_get#22
0: R1=ctx(off=0,imm=0) R10=fp0
0: (79) r1 = *(u64 *)(r1 +0)
func '__module_get' arg0 has btf_id 250 type STRUCT 'module'
1: R1_w=ptr_module(off=0,imm=0)
1: (79) r1 = *(u64 *)(r1 +1128) ; R1_w=scalar()
2: (79) r1 = *(u64 *)(r1 +0)
R1 invalid mem access 'scalar'
processed 3 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
ERROR: Error loading program: kfunc:vmlinux:__module_get
A similar error happens when dereferencing the double pointer with *
# bpftrace -e 'kfunc:__module_get { print((*args.module->trace_events)->flags);' } -v
An analogous program fails to load even when written using libbpf.
We need to use bpf_probe_read_kernel
for such cases so do not propagate the SizedType::is_btftype
flag when observing a dereference or array access of a double pointer in semantic analyser.
This also fixes another bug related to array access - field analyser was not doing field resolution on array accesses on pointers to structs.
Overall, this resolves #3016.
Checklist
- [ ] Language changes are updated in
man/adoc/bpftrace.adoc
- [x] User-visible and non-trivial changes updated in
CHANGELOG.md
- [x] The new behaviour is covered by tests
nice find! did you report this upstream? wonder if this should be supported by kernel
Not yet, but I'll report it. I wanted to check first if it is a bug or a feature :)
Seems more like a lack of a feature to me
Turns out the same problem exists when dereferencing via array accesses, i.e. args.trace_events[0]->flags
. I'll try to fix that, too, in this PR.
Fixed the array access on our side, too. I'll report/fix kernel upstream.
Originally, I wanted to separate the dereference fix from the array access one b/c the latter is one of two issues causing #3016. It turned out that it makes more sense to have them together so I added the other fix for #3016 to this PR. We now have two changelog entries with the same PR number but I think that it makes sense to merge everything together.
I'll also update the PR description.
Why is it that we need to reset is_btftype
on the type contained within the array, but for pointers only the outer type gets reset?
Why is it that we need to reset
is_btftype
on the type contained within the array, but for pointers only the outer type gets reset?
It's actually the other way around - for pointers, we just reset the pointed type, for arrays we reset both the array and the element types.
The reason is that there's a slight difference in codegen for ptr->x
and ptr[0]
:
- For
ptr->x
, we have two operations - dereference and field access. The load/probe_read is only done during the field access so the check foris_btftype
is done on the type of*ptr
. - For
ptr[0]
, we have just a single operation so the check must be done on the type ofptr
. We must also resetis_btftype
on the element type to avoid further propagation.
Is it possible to add a unit test for the FieldAnalyser change?
Sure! I added a test (and confirmed that it failed prior to this PR).
While adding the test, I updated data_source.c
for BTF and DWARF which changed one codegen test.
This slipped off my radar. I rebased onto the current master and will merge today afternoon if there are no objections.
Looks like a legit unit test failure
Looks like a legit unit test failure
This is a tricky one. It doesn't reproduce on a local build (at least not on my setup), only in Nix. It's caused by this (seemingly trivial) change in data_source.c
:
diff --git a/tests/data/data_source.c b/tests/data/data_source.c
index c6a7f713..eaecbc45 100644
--- a/tests/data/data_source.c
+++ b/tests/data/data_source.c
@@ -21,7 +21,7 @@ struct Foo3 {
struct Foo3 foo3;
-struct Foo3 *func_1(int a, struct Foo1 *foo1, struct Foo2 *foo2)
+struct Foo3 *func_1(int a, struct Foo1 *foo1, struct Foo2 *foo2, struct Foo3 *foo3)
{
return 0;
}
@@ -115,7 +115,7 @@ int main(void)
struct bpf_iter__task_file iter_task_file;
struct bpf_iter__task_vma iter_task_vma;
- func_1(0, 0, 0);
+ func_1(0, 0, 0, 0);
bpf_iter_task();
bpf_iter_task_file();
With this, the field_analyser_dwarf.uprobe_args
unit test fails because liblldb willl fail to get function parameters for func_3
. I was able to narrow it down to dwarf_parser.cpp:80 where func_3
is found but, for some reason, it is not identified as a function and therefore fn
will be null. Maybe @ttreyer could give some help on this?
With this, the
field_analyser_dwarf.uprobe_args
unit test fails because liblldb willl fail to get function parameters forfunc_3
. I was able to narrow it down to dwarf_parser.cpp:80 wherefunc_3
is found but, for some reason, it is not identified as a function and thereforefn
will be null. Maybe @ttreyer could give some help on this?
I have reproduced the environment with the error and will have a look at it 🙂
This look like an issue with LLDB and/or GCC 12.3.0, but not with GCC 11.4.1. I'm investigating further...
In the meantime, as a workaround to unblock this PR, you can add __attribute__((noinline))
to func_3
, which force GCC to emit enough DWARF for LLDB to correctly locate the function and its information.
In the meantime, as a workaround to unblock this PR, you can add
__attribute__((noinline))
tofunc_3
, which force GCC to emit enough DWARF for LLDB to correctly locate the function and its information.
Works great, thanks! I added it to the PR, should be now ready to merge.