liballocs
liballocs copied to clipboard
Tools don't work with DWARF 5
Trying to build liballocs on Ubuntu 22.04. Pleased to say I can get it quite far; will open a PR for a Dockerfile this week, granted I can make that last step no longer fail:
> make -f ./tools/Makefile.meta /usr/lib/meta`ldd /bin/true | grep 'libc.so.6' | sed -r 's/.*=> (.*) \(.*/\1/' | xargs readlink -f`-meta.so
tools/Makefile.meta:14: META_BASE is /usr/lib/meta
META_CC is cc
Generating frametypes
mkdir -p /usr/lib/meta/usr/lib/x86_64-linux-gnu/
errs=$( ( /home/octavel/Programs/PhD/git_repos/liballocs_to_run_locally/tools//frametypes /usr/lib/x86_64-linux-gnu/libc.so.6 3>&2 2>&1 1>&3 ) 2>/usr/lib/meta/usr/lib/x86_64-linux-gnu/libc.so.6-frametypes.c ); \
status=$?; echo "$errs" | gzip >/usr/lib/meta/usr/lib/x86_64-linux-gnu/libc.so.6-frametypes.c.log.gz; [ $status -eq 0 ] || (mv /usr/lib/meta/usr/lib/x86_64-linux-gnu/libc.so.6-frametypes.c /usr/lib/meta/usr/lib/x86_64-linux-gnu/libc.so.6-frametypes.c.err; false)
Aborted (core dumped)
make: *** [tools/Makefile.meta:130: /usr/lib/meta/usr/lib/x86_64-linux-gnu/libc.so.6-frametypes.c] Error 1
More specifically, frametypes gives this output:
> ./tools//frametypes /usr/lib/x86_64-linux-gnu/libc.so.6
read build-ID note off: 00000380
name_size: 4
desc_size: 20
Slurped build ID: 89c3cb85f9e55046776471fed05ec441581d1969
Trying: /usr/lib/debug/.build-id/89/c3cb85f9e55046776471fed05ec441581d1969.debug
warning: libdwarf reported mangled frame entries
terminate called after throwing an instance of 'dwarf::lib::Error'
[1] 129105 IOT instruction (core dumped) /usr/lib/x86_64-linux-gnu/libc.so.6
The build id it's trying to use *should* be correct:
> readelf -n /usr/lib/x86_64-linux-gnu/libc.so.6 | grep -1 build
Displaying notes found in: .note.gnu.build-id
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: 89c3cb85f9e55046776471fed05ec441581d1969
I seem to remember running into an issue related to these scripts/build-IDs (which you debugged, Stephen). IIRC it wasn't finding the right libc file, but really in this case ldd /bin/true | grep 'libc.so.6' | sed -r 's/.*=> (.*) \(.*/\1/' | xargs readlink -f seems to have the right output (/usr/lib/x86_64-linux-gnu/libc.so.6), so this should be a different issue.
It may also very well be a libc version mismatch, since the Ubuntu 18.04 image ends up with glibc 2.27 while Ubuntu 22.04 is working with libc 2.35. If so, that's more annoying to debug.
Can send a Dockerfile for Ubuntu 22.04 that builds up to that point, which may aid in pinpointing the source of this bug.
It may also very well be a libc version mismatch, since the Ubuntu 18.04 image ends up with glibc 2.27 while Ubuntu 22.04 is working with libc 2.35. If so, that's more annoying to debug.
Took another quick look by building libc-2.27 from source on my system and building metadata for it: that works much better, so that's probably it... As a side note, this build doesn't actually succeed on my system, but I'm not sure that's directly relevant - it at least goes much further than with 2.35.
Thanks Octave. Well done on getting this mostly working. Can you get a stack trace for the failing frametypes command? My guess is that libdwarfpp is choking on some DWARF it does not like. This often happens when upgrading to binaries that came out of a more recent toolchain. Your experiment building glibc 2.27's metadata successfully does slightly undermine that hypothesis, but it's still my guess.
Sure thing:
(gdb) bt 20
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737308301184) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=140737308301184) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=140737308301184, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007ffff7635476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x00007ffff761b7f3 in __GI_abort () at ./stdlib/abort.c:79
#5 0x00007ffff7edf685 in __gnu_cxx::__verbose_terminate_handler() () from /usr/local/src/liballocs/contrib/liballocstool/contrib/dwarfidl/contrib/libdwarfpp/lib/libdwarfpp.so.0
#6 0x00007ffff7eddb56 in __cxxabiv1::__terminate(void (*)()) () from /usr/local/src/liballocs/contrib/liballocstool/contrib/dwarfidl/contrib/libdwarfpp/lib/libdwarfpp.so.0
#7 0x00007ffff7eddba1 in std::terminate() () from /usr/local/src/liballocs/contrib/liballocstool/contrib/dwarfidl/contrib/libdwarfpp/lib/libdwarfpp.so.0
#8 0x00007ffff7eddcd4 in __cxa_throw () from /usr/local/src/liballocs/contrib/liballocstool/contrib/dwarfidl/contrib/libdwarfpp/lib/libdwarfpp.so.0
#9 0x00007ffff7dcc6e3 in dwarf::core::Die::Die (this=0x7fffffffd500, r=..., __in_chrg=<optimized out>, __vtt_parm=<optimized out>) at src/libdwarf-handles.cpp:105
#10 0x00007ffff7deda6e in dwarf::core::root_die::advance_cu_context (this=this@entry=0x555555861c30) at src/root.cpp:440
#11 0x00007ffff7deedda in dwarf::core::root_die::first_child (this=this@entry=0x555555861c30, it=...) at src/root.cpp:310
#12 0x00007ffff7def141 in dwarf::core::root_die::move_to_first_child (this=0x555555861c30, it=...) at src/root.cpp:345
#13 0x0000555555684ab0 in dwarf::core::iterator_df<dwarf::core::basic_die>::increment (this=this@entry=0x7fffffffdad0)
at /usr/local/src/liballocs/contrib/liballocstool/contrib/dwarfidl/contrib/libdwarfpp/include/dwarfpp/iter.hpp:584
#14 0x00005555555cbd03 in boost::iterators::iterator_core_access::increment<dwarf::core::iterator_df<dwarf::core::basic_die> > (f=...)
at /usr/include/boost/iterator/iterator_facade.hpp:556
#15 boost::iterators::detail::iterator_facade_base<dwarf::core::iterator_df<dwarf::core::basic_die>, dwarf::core::basic_die, boost::iterators::forward_traversal_tag, dwarf::core::basic_die&, long long, false, false>::operator++ (this=0x7fffffffdad0) at /usr/include/boost/iterator/iterator_facade.hpp:666
#16 main (argc=<optimized out>, argv=<optimized out>) at tools/frametypes.cpp:235
Pretty much identical stacktrace when running dwarftypes instead of frametypes, JFYI. Seems to be caused by the bit where you grab the compilation unit DIE offset.
Perhaps importantly, that stack trace was obtained by running tools/.libs/frametypes directly and not relying on the libtool wrapper script, otherwise getting a stack trace is less straightforward (to me).
OK, thanks. Looks like we're trying to increment from an invalid DIE position. Not sure why. First check: can you find which file it's reading the DWARF from? And then use readelf -wi to check it actually has some DWARF in it? If you have gdb on the crash, looking at the /proc/pid/fd directory is an easy way to get the filename. For that you can use info proc stat to get the PID, or usually just print getpid(). I am guessing the code has followed a .build-ID note or .gnu_debuglnk or similar to find a split-DWARF file, rather than getting it from the libc-2.35.so file proper.
One problem that can afflict Debian or Ubuntu builds is that the .build-ID in libc-2.NN.so in the libc6 package does not match the build ID of the DWARF file included in the libc6-dbg package. It's braindead but unfortunately arises from a weakness of the Debian package system: not enough information being encoded in the version numbers, and/or only being able to Depends: on a package version, not an arbitrary property of a package (whereas if this were allowed then the build ID could be added as a separate metadata field, even per-file).
Mentioning because: you might want to check the build ID against the dpkg -L of the package! You can get hte build ID with something like . /path/to/liballocs/tools/debug-funcs.sh && read_build_id /path/to/file.
A readelf -wi does show DWARF info (granted I got the right file, but I believe I did), but also readelf: Error: Unable to find program interpreter name - could that be relevant?
For the build ID vs dpkg -L:
user@f7e6dd94158f:/usr/local/src/liballocs$ . ./tools/debug-funcs.sh && read_build_id /usr/lib/x86_64-linux-gnu/libc.so.6
read build-ID note off: 00000380
name_size: 4
desc_size: 20
229b7dc509053fe4df5e29e8629911f0c3bc66dd
user@f7e6dd94158f:/usr/local/src/liballocs$ dpkg -L libc6-dbg | grep 229b
user@f7e6dd94158f:/usr/local/src/liballocs$ echo $?
1
But the file that was accessed by frametypes, /usr/lib/debug/.build-id/22/9b7dc509053fe4df5e29e8629911f0c3bc66dd.debug, does show up in the dpkg -L output. Getting a bit confused over all these debug files and build IDs, but hopefully this helps.
Thanks Octave.
I think the "interpreter name" thing is irrelevant. "Interpreter" means the name of the dynamic linker. I think this is case of readelf getting a bit confused on these lop-sided ELF files that are produced with strip --only-keep-debug or similar.
Those Debian packages look good... the build ID matches the filename. The first two digits are taken off first and used as a subdirectory name, hence the '22' here.
So, hmm, looks like I will have to get a debugger on this. Can you send me the debug info binary (the one with the long hex name) somehow? I am guessing it is pretty large so we may have to find some out-of-band way to send it.
Small update for the public record: we've debugged this and found it to be a bug triggered by newer DWARF features, which the submodule'd libdwarf (inside libdwarfpp, inside dwarfidl, inside liballocstool) is too old to understand. A lot of upstream changes have occurred in libdwarf so this will take a bit of careful merging that I haven't done yet.