object-introspection icon indicating copy to clipboard operation
object-introspection copied to clipboard

Failing to run example in Docker

Open Edward-Knight opened this issue 9 months ago • 9 comments

I'm trying to run the example from the guide in the documentation, but I get the following error:

Attached to pid 9
E20250720 16:34:02.623654    11 OIDebugger.cpp:1604] syscall: GETREGS failed for process 9: Input/output error
E20250720 16:34:02.624195    11 OIDebugger.cpp:296] setUpSegment failed!!!
E20250720 16:34:02.624416    11 OID.cpp:397] Failed to initialise segments in target process with PID 9

I'm running in Docker with this Dockerfile:

FROM ubuntu:noble
RUN apt-get update && apt-get install -y build-essential curl clang nix-bin
RUN curl -LsSf https://github.com/facebookexperimental/object-introspection/archive/refs/heads/main.tar.gz | tar xz
RUN cd object-introspection-main/examples/web/AddrBook && make
RUN cd object-introspection-main && nix --extra-experimental-features "nix-command flakes" --option filter-syscalls false run . -- --help

Building like so:

cat Dockerfile | docker build --platform linux/amd64 --tag oi -

Running like so:

docker run -it --rm oi
cd object-introspection-main
examples/web/AddrBook/addrbook > /dev/null &
nix --extra-experimental-features "nix-command flakes" --option filter-syscalls false run . -- -S 'entry:_ZN11AddressBook12DumpContactsEv:this' -p `pgrep addrbook` -J

NB: I didn't include the -c build/oid-cfg.toml argument as I built using nix (I tried and failed to build without).

Edward-Knight avatar Jul 20 '25 16:07 Edward-Knight

This may be an issue with my Docker setup: https://github.com/docker/for-mac/issues/6921

I'll attempt a different setup to confirm

Edward-Knight avatar Jul 20 '25 17:07 Edward-Knight

This may be an issue with my Docker setup: docker/for-mac#6921

I'll attempt a different setup to confirm

@Edward-Knight We pretty much always run the oid binary as root but you should just need to add the CAP_SYS_PTRACE capability (IIRC) although it's been a long time since we did that. Let us know if you need anything else.

tyroguru avatar Jul 21 '25 08:07 tyroguru

Using the exact same commends as above, but with Docker running on Linux, I get a new error:

E20250721 18:41:39.574158    11 OIDebugger.cpp:2606] Couldn't attach to target pid 9 (Reason: Operation not permitted)
E20250721 18:41:39.574265    11 OID.cpp:361] Couldn't stop target process with PID 9

Adding in the extra capability in the run command like so:

docker run -it --rm --cap-add=SYS_PTRACE oi

Nets me a new error:

Attached to pid 9
E20250721 18:45:01.964025    25 SymbolService.cpp:342] Failed to lookup function '0x1ed0': 9 not found
E20250721 18:45:01.964176    25 SymbolService.cpp:328] Failed to lookup symbol '_ZN11AddressBook12DumpContactsEv
E20250721 18:45:01.964197    25 SymbolService.cpp:743] Failed to create FuncDesc for _ZN11AddressBook12DumpContactsEv
E20250721 18:45:01.964226    25 SymbolService.cpp:342] Failed to lookup function '0x1ed0': 9 not found
E20250721 18:45:01.964246    25 SymbolService.cpp:328] Failed to lookup symbol '_ZN11AddressBook12DumpContactsEv
E20250721 18:45:01.964267    25 SymbolService.cpp:743] Failed to create FuncDesc for _ZN11AddressBook12DumpContactsEv
E20250721 18:45:01.964294    25 SymbolService.cpp:342] Failed to lookup function '0x1ed0': 9 not found
E20250721 18:45:01.964316    25 SymbolService.cpp:328] Failed to lookup symbol '_ZN11AddressBook12DumpContactsEv
E20250721 18:45:01.964336    25 SymbolService.cpp:743] Failed to create FuncDesc for _ZN11AddressBook12DumpContactsEv
E20250721 18:45:01.964366    25 SymbolService.cpp:342] Failed to lookup function '0x1ed0': 9 not found
E20250721 18:45:01.964386    25 SymbolService.cpp:328] Failed to lookup symbol '_ZN11AddressBook12DumpContactsEv
E20250721 18:45:01.964406    25 SymbolService.cpp:743] Failed to create FuncDesc for _ZN11AddressBook12DumpContactsEv
E20250721 18:45:01.964427    25 OIDebugger.cpp:2249] Failed to get all cache paths, aborting!
E20250721 18:45:01.964448    25 OID.cpp:413] Compilation failed

@tyroguru do you have any idea where this error comes from and what I can do to debug further? Thank you for your help so far!

I don't know if this helps or not, but this is the output I get from readelf:

$ readelf -sW examples/web/AddrBook/addrbook | grep _ZN11AddressBook12DumpContactsEv
    24: 0000000000001ed0  1451 FUNC    WEAK   DEFAULT   15 _ZN11AddressBook12DumpContactsEv

Edward-Knight avatar Jul 21 '25 18:07 Edward-Knight

@tyroguru do you have any idea where this error comes from and what I can do to debug further? Thank you for your help so far!

@Edward-Knight no worries and thanks for trying OI! At the minute I don't have a machine to try this on and it would be a few days before I do. Question: is the addrbook binary you are targeting built with DWARF debug? Do you see the debug sections if you execute size -At addrbook | grep "\.debug" ?

tyroguru avatar Jul 21 '25 19:07 tyroguru

Yes, we have debug symbols:

$ size -At addrbook | grep "\.debug"
.debug_info           67121       0
.debug_abbrev          3244       0
.debug_line            8585       0
.debug_str            49066       0
.debug_addr            4600       0
.debug_line_str        1529       0
.debug_loclists       13622       0
.debug_rnglists        4168       0
.debug_str_offsets     6784       0

Edward-Knight avatar Jul 21 '25 20:07 Edward-Knight

Yes, we have debug symbols:

$ size -At addrbook | grep ".debug" .debug_info 67121 0 .debug_abbrev 3244 0 .debug_line 8585 0 .debug_str 49066 0 .debug_addr 4600 0 .debug_line_str 1529 0 .debug_loclists 13622 0 .debug_rnglists 4168 0 .debug_str_offsets 6784 0

That sucks and apologies for what is basic breakage. Here's a very high level excuse for why you may have experienced this: Apols if you've seen the CppCon video which explains this but there are essentially two major pieces of technology in Object Introspection, the Object Introspection Debugger (oid) with its associated pieces and OIL (Object Introspection as a Library). They achieve the same ends but do it in very different ways and have a mostly separate code base. oid controls target processes like a regular debugger but just in a non-interactive manner. OIL embeds itself into the application and does code generation/compilation as part of the application build using clang technology and just executes in the application's address space at runtime like a normal library.

Meta stopped using oid internally at the start of 2024 because of issues with the generated DWARF that the internal toolchains were producing. The generated DWARF just wasn't usable in a consistent manner and that was causing enormous headaches for us so we made the decision to discontinue that approach. It is still a very usable approach IMO and we gained enormous, novel insights into the runtime behaviour of massively complex data structures using it. However, if the DWARF isn't usable then it wasn't a viable way forward for us.

OIL on the other hand inserts itself into the compilation phase and is much more stable. It uses clang APIs to introspect the types during compilation and does the code generation and JIT compilation phases then (it's AOT really but who's counting?). The introspection code blobs are stored in the generated binary and can be called at runtime with no outside assistance/manipulation. All that's required is an extremely simple code modification (the API is ridiculously simple for something so powerful). Unfortunately it requires changes to users build system to shoehorn it into the compilation phase and this was something that was all done internally within Meta and not published to the OSS code (difficult to generalise that). Just tagging @JakeHillion who did a lot of the extremely brilliant work on OIL in case he has comments/additions.

As I said previously, I'm away for a few days and not at a machine where I can test this currently but I will try this out and get back to you.

tyroguru avatar Jul 21 '25 21:07 tyroguru

Thanks for your quick replies, I can understand how bits of the open source code atrophy if they're not actively used (and had consistency issues as you mention).

I was playing around with this in an to attempt to look at the layout of various objects in CPython. It's a fairly active area of development by the "faster CPython" team over the last few releases of Python (see one of Mark Shannon's talks on the subject if you're curious). From the Python side there are a few tools like sys.getsizeof(), but to get a better idea of the actual layout one has to dive into the C code and manually create diagrams like the ones in Mark's talk. And as mentioned in your CppCon talk, this approach is not only laborious but can be inaccurate (particularly if done by a non-C programmer like myself!). I don't even know if this is an appropriate type of introspection to use for looking at CPython (being C instead of C++), but thought I'd have a play around with it anyway

Edward-Knight avatar Jul 21 '25 22:07 Edward-Knight

Replying to a tonne of different things here, sorry I was late getting to this!

We pretty much always run the oid binary as root but you should just need to add the CAP_SYS_PTRACE capability (IIRC) although it's been a long time since we did that. Let us know if you need anything else.

Should be fine without as long as the same user running oid owns the target process. The CI doesn't add CAP_SYS_PTRACE anymore or use root, though it seems the Docker experience might contradict this - it should work totally fine in the root namespace without any extra perms/capabilities.

re: oil

The oilgen binary should work fairly well with a compile_commands.json, but it's been a while since it's been extensively used. There's no CMake support but it's not too hard to plumb in. Happy to answer questions on it if you create a new issue, and we would be happy with an upstream contribution. It sounds from the rest of the context like oid is what you want though, and it should work properly for OSS stuff.

Thanks for your quick replies, I can understand how bits of the open source code atrophy if they're not actively used (and had consistency issues as you mention).

Yep, one of the mistakes I made here is not setting up something to automatically update the Nix flake lock in this repo. I tried to do that when I caught that issue but it's been a while and it's deeply unhappy - still working on it! It seems like the clang/LLVM-15 version has become substantially less useful with some changes to libstdc++'s ranges making it incompatible, so I'm weighing up removing it at the minute. The downside is there's a huge build speed regression in LLVM-17 that we also never debugged, so that would leave us pinned to 16. Will throw in a couple of pull requests when I have time this week to get a better understanding of the current state.

One suggestion I would make is to look at what the CI does. It still gets run occasionally, mostly for NPM updates, so should be valid.

I was playing around with this in an to attempt to look at the layout of various objects in CPython. It's a fairly active area of development by the "faster CPython" team over the last few releases of Python (see one of Mark Shannon's talks on the subject if you're curious). From the Python side there are a few tools like sys.getsizeof(), but to get a better idea of the actual layout one has to dive into the C code and manually create diagrams like the ones in Mark's talk. And as mentioned in your CppCon talk, this approach is not only laborious but can be inaccurate (particularly if done by a non-C programmer like myself!). I don't even know if this is an appropriate type of introspection to use for looking at CPython (being C instead of C++), but thought I'd have a play around with it anyway

The approach sounds sensible to me! The main thing with C code instead of C++ is you need the OID --chase-raw-pointers (or similar) command, and if you have any pointers which are non-null but are invalid it won't work, or if void pointers are used extensively the graph will be incomplete. Other than that it should handle many data structures well. Feel free to create a separate issue/message on the IRC for this too if there are any quick questions. I always thought oid would be fantastic at debugging large memory footprints in interpreters like Python or Nix so I hope we can get things working for you with our new limited support footprint.

JakeHillion avatar Jul 22 '25 10:07 JakeHillion

Apologies for the delay @Edward-Knight . The current problems seem to be related to us not handling PIE executables (at least that's the issue I'm seeing in my test setup). The workaround is easy for the address book sample case and that's to build the with -no-pie clang flag. If you have a target binary that is built without PIE then you may be good to try things but I know that's not a good situation.

This raises a much bigger question for the OI project and that is what to do about our DWARF support. Currently we use the drgn technology for our DWARF work and we have a number of our own additions which were never upstreamed to the main drgn codebase. We probably need to bring our private version up to date with the current drgn codebase but that's probably a bunch of work. I'm talking to the main developer of the drgn project to figure out the next best steps. Sorry that I don't have better new in the immediate term.

tyroguru avatar Jul 28 '25 20:07 tyroguru