libsystrap icon indicating copy to clipboard operation
libsystrap copied to clipboard

Tests failing due to linker issues

Open difcsi opened this issue 1 year ago • 6 comments

Tests such as the generic/time are failing due to issues with eagerly dynamic-linked symbols, usually calloc.

Given the library is mature enough, the convoluted way of linking the tests may be unnecessary.

difcsi avatar Nov 06 '24 20:11 difcsi

Thanks Zoltan. Here for my memory's sake I am going to elaborate a bit on the problem, although I haven't fixed it yet.

in test/generic we see failures like the following:

$ ./time
./time: symbol lookup error: ./time: undefined symbol: calloc, version GLIBC_2.2.5

... even though:

$ objdump -T ./time | grep calloc

i.e. calloc is not a symbol that the binary links against.

My suspicion is that the dynamic linker internally depends on a calloc during its symbol binding pathway, and the message is reflecting that somehow. If so, perhaps the change (which broke these tests) was something like a switch from eager to lazy binding or vice-versa, changing when this pathway is needed and/or what it does.

stephenrkell avatar Nov 08 '24 00:11 stephenrkell

Here is the backtrace from a failing request for calloc within the dynamic linker on my system (glibc 2.36).

#0  _dl_debug_vdprintf (fd=<optimized out>, tag_p=<optimized out>, tag_p@entry=1, 
    fmt=<optimized out>, fmt@entry=0x7ffff7ff360a "%s: error: %s: %s (%s)\n", 
    arg=arg@entry=0x7fffffffc7c8) at ../sysdeps/unix/sysv/linux/dl-writev.h:36
#1  0x00007ffff7fd7aea in _dl_debug_printf (
    fmt=fmt@entry=0x7ffff7ff360a "%s: error: %s: %s (%s)\n") at dl-printf.c:234
#2  0x00007ffff7fe308c in _dl_signal_cexception (errcode=0, exception=0x7fffffffc930, 
    occasion=<error reading variable: Cannot access memory at address 0xffffc8a8>)
    at dl-error-skeleton.c:136
#3  0x00007ffff7fd4b09 in _dl_lookup_symbol_x (
    undef_name=undef_name@entry=0x7ffff7ff3718 "calloc", 
    undef_map=undef_map@entry=0x7ffff7ffe300, ref=ref@entry=0x7fffffffc9a0, 
    symbol_scope=<optimized out>, version=version@entry=0x7fffffffc9d0, )
    at dl-lookup.c:797
#4  0x00007ffff7fe4102 in lookup_malloc_symbol (main_map=main_map@entry=0x7ffff7ffe300, 
    name=name@entry=0x7ffff7ff3718 "calloc", version=version@entry=0x7fffffffc9d0)
    at dl-minimal.c:64
#5  0x00007ffff7fe4219 in __rtld_malloc_init_real (main_map=main_map@entry=0x7ffff7ffe300)
    at dl-minimal.c:91
#6  0x00007ffff7fe95ff in dl_main (phdr=<optimized out>, phnum=<optimized out>, 
    user_entry=<optimized out>, auxv=<optimized out>) at rtld.c:2373
#7  0x00007ffff7fe50bf in _dl_sysdep_start (
    start_argptr=start_argptr@entry=0x7fffffffcdd0, 
    dl_main=dl_main@entry=0x7ffff7fe6d20 <dl_main>)
    at ../sysdeps/unix/sysv/linux/dl-sysdep.c:140
#8  0x00007ffff7fe6a2e in _dl_start_final (
    arg=<error reading variable: Cannot access memory at address 0xffffcd38>) at rtld.c:496
#9  _dl_start (arg=<optimized out>) at rtld.c:583
#10 0x00007ffff7fe58e8 in _start () from /lib64/ld-linux-x86-64.so.2

stephenrkell avatar Nov 08 '24 00:11 stephenrkell

And in dl-minimal.c we find:

85        struct r_found_version version;
86        version.name = symbol_version_string (libc, GLIBC_2_0);
87        version.hidden = 0;
88        version.hash = _dl_elf_hash (version.name);
89        version.filename = NULL;
90
91        void *new_calloc = lookup_malloc_symbol (main_map, "calloc", &version);
92        void *new_free = lookup_malloc_symbol (main_map, "free", &version);
93        void *new_malloc = lookup_malloc_symbol (main_map, "malloc", &version);
94        void *new_realloc = lookup_malloc_symbol (main_map, "realloc", &version);

i.e. the code is assuming there is a calloc (and others) available, even though in the binary I have, it is not linked in.

We could work around this by bundling a malloc implementation, like dlmalloc or even just a dummy no-free one, into our test cases. To me it seems like a bug that the ld.so is assuming it has a libc... when I get a moment I'll file this on the Bugzilla and we can see what they say.

stephenrkell avatar Nov 08 '24 01:11 stephenrkell

Looking at an old version of the glibc ld.so, I find that it did have calloc and friends as weak definitions. This holds up to at least version 2.31.

This commit might be relevant.

stephenrkell avatar Nov 08 '24 01:11 stephenrkell

Actually, this one is the culprit.

stephenrkell avatar Nov 08 '24 02:11 stephenrkell

I've now reported this here.

stephenrkell avatar Nov 08 '24 03:11 stephenrkell

I believe the workaround was committed in 566760e.

stephenrkell avatar Mar 29 '25 12:03 stephenrkell