magic-trace icon indicating copy to clipboard operation
magic-trace copied to clipboard

Failure to read symbols inside docker-container due to broken /proc/pid/exe link

Open ppershing opened this issue 1 year ago • 3 comments

Hi. I encountered this problem while trying to trace binary inside docker-container. The problem manifests as rather misleading

Cannot select a snapshot symbol because magic-trace can't find that executable's symbol table. Was it built without debug info, or with debug info magic-trace doesn't understand?

message which doesn't indicate failure to open the executable but strace helped here to find the root cause as running the binary outside of container worked well.

The reason is that /proc/<pid>/exe contains a path used to launch the process inside the container which means this path refers to directories inside the container. This symlink is not valid outside of the container

I believe the problem can be fixed by skipping Core_unix.readlink here https://github.com/janestreet/magic-trace/blob/7ac635e41438248c3a3348c039dae00854f38342/src/trace.ml#L722 and just directly read the contents of the /proc/<pid>/exe.

ppershing avatar Oct 08 '24 08:10 ppershing

I also have issues running magic-trace on docker targets, but specifically Java.

The below was a failed attempt to make magic-trace open the link rather than resolving it. I realized that it didn't work after posting here.

I looked a bit into this for native apps to see if the readlink could be the source of my problems. but for my c++ apps magic trace actually works fine both with and without opening /proc/<pid>/exe right away. tested on a single docker container running a single c++ app.

I don't understand why the readlink command is there, maybe for good reason?

here is a small c program and instructions to view the effects of not having readlink there. it works by intercepting the call to readlink and simply returning the same path that it got.

#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

ssize_t readlink(const char *pathname, char *buf, size_t bufsiz) {
    size_t path_len = strlen(pathname);
    strncpy(buf, pathname, bufsiz);
    return path_len;
}

compile it like this

$ gcc -shared -fPIC readlink_dummy.c -o readlink_dummy.so -ldl

and then run magic trace with LD_PRELOAD

$ sudo LD_PRELOAD=./readlink_dummy.so magic-trace attach -multi-thread -pid <pid>

Hailios avatar Nov 04 '24 10:11 Hailios

I believe the problem can be fixed by skipping Core_unix.readlink here

https://github.com/janestreet/magic-trace/blob/7ac635e41438248c3a3348c039dae00854f38342/src/trace.ml#L722

and just directly read the contents of the /proc/<pid>/exe.

Note - after fixing it in mentioned place, there is a second place which needs fixing: https://github.com/janestreet/magic-trace/blob/7ac635e41438248c3a3348c039dae00854f38342/src/elf.ml#L302 - replace Filename_unix.realpath with Core_unix.readlink - this is needed because of subsequent filtering based on names from /proc/pid/maps

I originally missed this because while debugging I hard-copied binary to the same place on host that was in the docker.

ppershing avatar Nov 04 '24 11:11 ppershing

Thanks for digging into this, both of you.

@ppershing:

this is needed because of subsequent filtering based on names from /proc/pid/maps

Could you give an example of where Filename_unix.realpath and Core_unix.readlink would result in differing behavior here? (Ideally with what /proc/$pid/maps needs to look like?)

It seems reasonable to me to have this patch upstream, but I'd like to understand the behavior change first.

Xyene avatar Nov 20 '24 23:11 Xyene