Support BPF program symbolization
As part of the effort of improving our kernel symbolization logic, we would like to support symbolization of addresses mapping to BPF programs. Here is a brain dump roughly outlining what (I think) is necessary to support such symbolization. Everything and anything could be wrong ...
- in
/proc/<pid>/maps/PROCMAP_QUERYBPF programs would be represented with a "name" ofbpf_prog_<some-hex-number>;some-hex-numberseems to be the program's "tag" and can be used for finding more information - use
bpf_prog_get_next_idto iterate over loaded programs and find the one with matching the tag - use
bpf_prog_get_fd_by_idto retrieve program file descriptor - use
bpf_obj_get_info_by_fdto retrieve program information using said file descriptor - use
bpf_prog_info.{nr_jited_line_info, jited_line_info, line_info_rec_size}and similar, in conjunction with the kernel's BTF information, to retrieve function name and source code path- I'd hope there are examples floating around for this, but haven't checked
In terms of integration into blazesym, a good starting point to look at would probably be https://github.com/libbpf/blazesym/blob/40a46a48fd83be25b6b32b3401837a8907d23301/src/symbolize/symbolizer.rs#L896
What data to cache (and at what level) is somewhat of an open question. At the very least I'd say we should be remembering the result for a given address and reuse that on repeated symbolization. But perhaps a more coarse grained approach (e.g., caching at the function level, if there is such a thing, or remembering what BPF program maps to what tag) may be useful as well. I have no idea of performance characteristics of any of the APIs we need to interface with.
As I mentioned above, I think we may need some basic BTF support (mostly for string lookup?) as well as BPF syscall bindings. Usage of libbpf-rs is a possibility (should contain both), though I don't know if we really want to add a dependency to libbpf-rs and libbpf longer term. But we can think about that once a POC is working.
We would also require some prerequisite work introducing proper kernel testing infrastructure to be able to test this symbolization on injected programs as well. At this point I think it mostly comes down to loading BPF programs, as we already support testing on arbitrary kernels using vmtest. Again, this should be provided by libbpf-rs, which I think is a no brainer to use in a testing context.
cc @jfernandez
Very interested in this work, happy to review and test any PRs! :)
This is now out for review https://github.com/libbpf/blazesym/pull/854
@javierhonduco feel free to try it out and report back. Also, let me know if you have any questions.
Great stuff @danielocfb! Thanks for the heads up! This week I won't have too much time to take a look a this, but will make sure to do it early next week.