blazesym icon indicating copy to clipboard operation
blazesym copied to clipboard

Support BPF program symbolization

Open danielocfb opened this issue 1 year ago • 2 comments

As part of the effort of improving our kernel symbolization logic, we would like to support symbolization of addresses mapping to BPF programs. Here is a brain dump roughly outlining what (I think) is necessary to support such symbolization. Everything and anything could be wrong ...

  • in /proc/<pid>/maps / PROCMAP_QUERY BPF programs would be represented with a "name" of bpf_prog_<some-hex-number>; some-hex-number seems to be the program's "tag" and can be used for finding more information
  • use bpf_prog_get_next_id to iterate over loaded programs and find the one with matching the tag
  • use bpf_prog_get_fd_by_id to retrieve program file descriptor
  • use bpf_obj_get_info_by_fd to retrieve program information using said file descriptor
  • use bpf_prog_info.{nr_jited_line_info, jited_line_info, line_info_rec_size} and similar, in conjunction with the kernel's BTF information, to retrieve function name and source code path
    • I'd hope there are examples floating around for this, but haven't checked

In terms of integration into blazesym, a good starting point to look at would probably be https://github.com/libbpf/blazesym/blob/40a46a48fd83be25b6b32b3401837a8907d23301/src/symbolize/symbolizer.rs#L896

What data to cache (and at what level) is somewhat of an open question. At the very least I'd say we should be remembering the result for a given address and reuse that on repeated symbolization. But perhaps a more coarse grained approach (e.g., caching at the function level, if there is such a thing, or remembering what BPF program maps to what tag) may be useful as well. I have no idea of performance characteristics of any of the APIs we need to interface with.

As I mentioned above, I think we may need some basic BTF support (mostly for string lookup?) as well as BPF syscall bindings. Usage of libbpf-rs is a possibility (should contain both), though I don't know if we really want to add a dependency to libbpf-rs and libbpf longer term. But we can think about that once a POC is working.

We would also require some prerequisite work introducing proper kernel testing infrastructure to be able to test this symbolization on injected programs as well. At this point I think it mostly comes down to loading BPF programs, as we already support testing on arbitrary kernels using vmtest. Again, this should be provided by libbpf-rs, which I think is a no brainer to use in a testing context.

danielocfb avatar Sep 24 '24 17:09 danielocfb

cc @jfernandez

danielocfb avatar Sep 24 '24 17:09 danielocfb

Very interested in this work, happy to review and test any PRs! :)

javierhonduco avatar Sep 26 '24 14:09 javierhonduco

This is now out for review https://github.com/libbpf/blazesym/pull/854

@javierhonduco feel free to try it out and report back. Also, let me know if you have any questions.

d-e-s-o avatar Oct 18 '24 17:10 d-e-s-o

Great stuff @danielocfb! Thanks for the heads up! This week I won't have too much time to take a look a this, but will make sure to do it early next week.

javierhonduco avatar Oct 22 '24 12:10 javierhonduco