Some debug info not automatically loaded for userland core
It appears that, at least in some cases, not all symbols are loaded when a corefile is opened:
delphix@ip-10-110-245-2:~$ sudo sdb $(which init) core.1
sdb> echo 0x55e87c2be070 | cast Manager* | member units
(Hashmap *)0x55e87c2bcc60
sdb> echo 0x55e87c2be070 | cast Manager* | member units.b
sdb: member: 'Hashmap' has no member 'b'
Some of the symbols have been loaded, but not all. We can find the missing .debug file and specify manually:
delphix@ip-10-110-245-2:~$ ldd $(which init)
linux-vdso.so.1 (0x00007fffee994000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fcd14535000)
libsystemd-shared-237.so => /lib/systemd/libsystemd-shared-237.so (0x00007fcd140f2000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fcd13eea000)
libseccomp.so.2 => /lib/x86_64-linux-gnu/libseccomp.so.2 (0x00007fcd13ca3000)
libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007fcd13a7b000)
libmount.so.1 => /lib/x86_64-linux-gnu/libmount.so.1 (0x00007fcd13827000)
libblkid.so.1 => /lib/x86_64-linux-gnu/libblkid.so.1 (0x00007fcd135da000)
...
delphix@ip-10-110-245-2:~$ readelf --string-dump=.gnu_debuglink /lib/systemd/libsystemd-shared-237.so
String dump of section '.gnu_debuglink':
[ 0] cb24b17254df4d10c026f09c8fc06e41716ea7.debug
[ 30] n^QY
delphix@ip-10-110-245-2:~$ sudo sdb -s /usr/lib/debug/.build-id/b0/cb24b17254df4d10c026f09c8fc06e41716ea7.debug $(which init) core.1
sdb> echo 0x55e87c2be070 | cast Manager* | member units.b
(struct HashmapBase){
.hash_ops = (const struct hash_ops *)0x7fd03b13ccf0,
.indirect = (struct indirect_storage){
.storage = (void *)0x55e87c35f690,
.hash_key = (uint8_t [16]){ 7, 126, 50, 4, 74, 37, 110, 155, 218, 11, 177, 80, 154, 199, 219, 37, },
.n_entries = (unsigned int)228,
.n_buckets = (unsigned int)481,
.idx_lowest_entry = (unsigned int)0,
._pad = (uint8_t [3]){},
},
.direct = (struct direct_storage){
.storage = (uint8_t [39]){ 144, 246, 53, 124, 232, 85, 0, 0, 7, 126, 50, 4, 74, 37, 110, 155, 218, 11, 177, 80, 154, 199, 219, 37, 228, 0, 0, 0, 225, 1, },
},
.type = (enum HashmapType)HASHMAP_TYPE_PLAIN,
.has_indirect = (_Bool)1,
.n_direct_entries = (unsigned int)0,
.from_pool = (_Bool)1,
}
Marked this as a drgn bug.
My hypothesis is the following:
You generated the core before having the debug information for that library installed in the system. When the coredump was generated and the ELF notes containing all the loaded libraries and their .gnu_debuglinks your system did not have a debuglink recorded for that library because it didn't exist at the time (the debug package was not installed). Thus, the core recorded that it didn't have debug info for that library. When sdb launched its drgn.Program instance it looked at the core for directions on where to find debug info and the core instructed it appropriatelly given the data that it recorded at the time (again no debug info for this library).
You should be able to verify the above hypothesis by generating a core dump again for the same process on a system with and without the debug info for this library.
If my hypothesis is right, it would be great if drgn also looked at the binary installed in the system to see if it has a debug link now (maybe optionally print a warning too).
BTW, can you run the following?:
sudo sdb -s /lib/systemd/libsystemd-shared-237.so $(which init) core.1
If -s doesn't follow the debug link that is a separate bug (in the example above you manually follow it yourself)
@sdimitro I didn't install any additional packages between the time when I generated the core and when I ran SDB on it. (This was on a different system than the one we were looking at together yesterday).
FWIW, gdb is able to handle this automatically:
delphix@ip-10-110-245-2:~$ sudo gdb $(which init) core.1
...
(gdb) p ((Manager*)0x55e87c2be070)->units.b
$1 = {hash_ops = 0x7fd03b13ccf0 <string_hash_ops>, {indirect = {storage = 0x55e87c35f690, hash_key = "\a~2\004J%n\233\332\v\261P\232\307\333%", n_entries = 228, n_buckets = 481, idx_lowest_entry = 2, _pad = "\000\000"}, direct = {
storage = "\220\366\065|\350U\000\000\a~2\004J%n\233\332\v\261P\232\307\333%\344\000\000\000\341\001\000\000\002\000\000\000\000\000"}}, type = HASHMAP_TYPE_PLAIN, has_indirect = true, n_direct_entries = 0, from_pool = true}
So I think it should be possible to do, one way or another.
BTW, can you run the following?: sudo sdb -s /lib/systemd/libsystemd-shared-237.so $(which init) core.1
Yes, that works! That's neat, I didn't realize that.