drgn icon indicating copy to clipboard operation
drgn copied to clipboard

BPF: Add helpers for BPF links and BTF objects, update tools/ script to work with Linux 5.10+

Open qmonnet opened this issue 3 years ago • 6 comments

bpf_prog.aux.member_("trampoline") is no longer available starting from kernel 5.10, and the BPF tool script fails to display the linked functions properly on new systems. This PR fixes the issue by retrieving the link and tracing link associated to the program, to access the target.

Since we need to iterate over BPF links to do that, add the relevant iterator; and do one for BTF links as well, since they're all pretty similar. Update the BPF tool script so it can use new those iterators to list links and BTF objects.

Please refer to individual commit descriptions for more details.

qmonnet avatar Jan 29 '22 20:01 qmonnet

Thanks a lot for the feedback and review, and apologies for the delay. I'm still working on this PR. I fixed the issues you reported some time ago, but so far I've been unable to find the time to test the latest changes properly. I'm pushing my code anyway but marking as draft, until I get a chance to check that it works correctly.

qmonnet avatar Feb 21 '22 10:02 qmonnet

Hi @qmonnet, I'm catching up on my backlog of PRs and I'm revisiting this one. I just merged the first commit from this PR adding the bpf_link_for_each() and bpf_btf_for_each() helpers: 764a858ee6a38618dad91f48dfd81f2949d43e91. I also added test cases for all of the BPF helpers: 43f045ae1ad97274628a9d44d9749dfccfe9c3a5.

For the second commit fixing bpf_inspect.py, what sort of testing did you want to do? Now that the unit tests can test BPF, maybe we can add a test case there.

For the third commit, do the link and btf printing commands print anything that isn't available via bpftool? If so, it may not make sense to maintain that code in bpf_inspect.py, too, since its charter is to "list ... properties unavailable via kernel API".

osandov avatar Jul 22 '22 06:07 osandov

Hi @qmonnet, I'm catching up on my backlog of PRs and I'm revisiting this one. I just merged the first commit from this PR adding the bpf_link_for_each() and bpf_btf_for_each() helpers: 764a858. I also added test cases for all of the BPF helpers: 43f045a.

Thanks a lot! Sorry for failing to follow-up here. The new tests look neat!

For the second commit fixing bpf_inspect.py, what sort of testing did you want to do? Now that the unit tests can test BPF, maybe we can add a test case there.

I wanted to make sure that printing trampolines/target programs would still work as expected on “all” kernel versions. We've got 3 cases after this patch: pre-5.5, 5.5 <= x < 5.10, and >= 5.10. I tested the last case (with 5.15) on my laptop and created a VM to try the first case (5.4), but I never found the time to create a second VM to test the remaining case. Although, given that we have not changed how the script behaves for kernels < 5.10 and that it works as expected on 5.4, I wouldn't expect too many bad surprises and maybe we're good to (rebase and) merge this change.

It would probably be a good idea to have a test with showing programs with the helper, and checking the target program when used with a trampoline. I haven't dug enough yet to check if you had this already, or to check how you are running your CI. Do you cover several kernel versions?

For the third commit, do the link and btf printing commands print anything that isn't available via bpftool? If so, it may not make sense to maintain that code in bpf_inspect.py, too, since its charter is to "list ... properties unavailable via kernel API".

The listing from link and btf should both be available from bpftool, that's correct. Listing programs and maps also have some overlap with bpftool, so I thought it would be nice to have links and BTF in drgn too, for the sake of completeness, given that it was already able to show some of the BPF objects.

Obviously that's your call. If you prefer to leave them aside, I guess we could skip that patch.

qmonnet avatar Jul 23 '22 12:07 qmonnet

[Rebased on current main.]

qmonnet avatar Jul 23 '22 12:07 qmonnet

I wanted to make sure that printing trampolines/target programs would still work as expected on “all” kernel versions. We've got 3 cases after this patch: pre-5.5, 5.5 <= x < 5.10, and >= 5.10. I tested the last case (with 5.15) on my laptop and created a VM to try the first case (5.4), but I never found the time to create a second VM to test the remaining case. Although, given that we have not changed how the script behaves for kernels < 5.10 and that it works as expected on 5.4, I wouldn't expect too many bad surprises and maybe we're good to (rebase and) merge this change.

It would probably be a good idea to have a test with showing programs with the helper, and checking the target program when used with a trampoline. I haven't dug enough yet to check if you had this already, or to check how you are running your CI. Do you cover several kernel versions?

Yup, the CI tests a bunch of kernel versions: https://github.com/osandov/drgn/blob/main/setup.py#L134. The design is documented here: https://github.com/osandov/drgn/tree/main/vmtest.

drgn's test suite doesn't have any test cases for tools yet, so that's something I'd like to figure out. If you can give me an example of how to create some BPF trampolines (perhaps with BPF_TRACE_FENTRY programs?), I can adapt that into a test. I'm imagining something like doing some bpf(2) calls to create the programs like we do for the helper tests, then executing the tool and checking its output.

osandov avatar Jul 24 '22 22:07 osandov

Yup, the CI tests a bunch of kernel versions: https://github.com/osandov/drgn/blob/main/setup.py#L134. The design is documented here: https://github.com/osandov/drgn/tree/main/vmtest.

Thanks for the pointers

If you can give me an example of how to create some BPF trampolines (perhaps with BPF_TRACE_FENTRY programs?)

I don't have a minimal example to point you to :/ and it seems to be a bit more involved. All examples I can think of are using libbpf (usually with BPF skeletons). For testing here on my side, I ran profiled an eBPF program (any should do) with bpftool (bpftool prog profile id 685 cycles). This attached new programs with fentry/fexit.

I think that in libbpf, it calls attach_trace() and from there bpf_program__attach_btf_id(). In that function we create a BPF link, open a raw tracepoint for the program (bpf_raw_tracepoint_open(), attach its fd to the link. There's also some BTF involved before all that I think, we need to pass the relevant BTF id when loading the fentry program (so before attaching).

qmonnet avatar Jul 30 '22 21:07 qmonnet