patchelf
patchelf copied to clipboard
Fix --remove-needed not updating .gnu.version_r
Fixes #214
Constructed after reading @njsmith's great work on #84 / #85 (thus some variable names being the same).
Technical considerations
Even if the section is theorically a compact array of Elf_Verneed and Elf_Vernaux entries (with the number of top Elf_Verneed entries being written in 1 or 2 places) there is redundancy with a chained list where each entry has the next entry's offset in its vn_next field (with a value of 0 ending the list… thus giving a second way to know we have reached the end, in addition to the count field(s)).
So there could be holes in the section (when vn_next > sizeof(ElfXX_Verneed + its attached ElfXX_Vernauxs)), and maybe (admittedly stealthy) data to preserve in those holes.
To adapt to that uncertainty, the patch tries to be as conservative as possible in its writes:
instead of shifting the last entries to occupy the place let by the removed ones, we simply use the previous entry's vn_next to jump over the removed one (except on the first entry, which has no previous, but is pointed at by the header's sh_offset).
To be done
Two questions still make it a work in progress (in addition to a firm code review, because I'm not familiar at all with ELF):
- how to handle removing of the last entry? Can I let a section with a size of 0, or should I remove it entirely?
- what to do with the versioned symbols that "loose" their version indication?
this question will be handled in the first comment, to avoid polluting this description once resolved
Orphan symbols
Original problem
I started on FreeBSD 14.2, with a simple C++ program:
#include <iostream>
int main(int argc, char ** argv)
{
std::cerr << "Plouf" << std::endl;
return 0;
}
This program was first compiled with the system clang++, which linked it to /lib/libc++.so.1 and /lib/libcxxrt.so.1.
After adding a custom clang toolchain (18.1.8 instead of the stock 18.1.6, but with its own libc++.so.1 and libc++abi.so.1),
the program crashed after having run alright (cerr and endl correctly output), in its destructors part (~sentry tries to know if we are in exception handling with a call to uncaught_exception() that accesses something unitialized or overwritten).
I can resolve this crash either by:
--add-needed libc++abi.so.1which is inserted beforelibcxxrt.so.1and thus gets called instead oflibcxxrt.so.1over their common symbols--replace-needed libcxxrt.so.1 libc++abi.so.1of course--remove-needed libcxxrt.so.1thus lettinglibc++.so.1as the only needed lib (and thus the lookup for needed libs will cascade to the ones mentioned inlibc++.so.1, includinglibc++abi.so.1)
However patchelf --remove-needed without this PR's patch corrupts the binary for FreeBSD's loader (ld-elf.so.1), either when launching it or when inspecting it with ldd:
ld-elf.so.1: /tmp/1.bin.sanscxxrt: Unexpected inconsistency: dependency libcxxrt.so.1 not found
readelf is a bit more robust and sees that the needed lib has been removed:
0x0000000000000001 NEEDED Shared library: [libc++.so.1]
- 0x0000000000000001 NEEDED Shared library: [libcxxrt.so.1]
0x0000000000000001 NEEDED Shared library: [libm.so.5]
But the entry still in .gnu.version_r triggers the "Unexpected inconsistency":
Symbol table '.dynsym' contains 26 entries:
Num: Value Size Type Bind Vis Ndx Name
[…]
7: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __cxa_begin_catch@CXXABI_1.3 (3)
[…]
Version symbol section (.gnu.version):
000: 0 *local* 2 FBSD_1.7 1 *global* 1 *global*
004: 1 *global* 1 *global* 1 *global* 3 CXXABI_1.3
008: 1 *global* 3 CXXABI_1.3 4 GCC_3.0 1 *global*
00c: 5 GLIBCXX_3.4 1 *global* 1 *global* 1 *global*
010: 1 *global* 1 *global* 6 FBSD_1.0 1 *global*
014: 1 *global* 1 *global* 1 *global* 1 *global*
018: 1 *global* 3 CXXABI_1.3
Version needed section (.gnu.version_r):
0x0000 vn_version: 1 vn_file: libcxxrt.so.1 vn_cnt: 2
0x0030 vna_name: CXXABI_1.3 vna_flags: 0 vna_other: 3
0x0040 vna_name: GLIBCXX_3.4 vna_flags: 0 vna_other: 5
0x0010 vn_version: 1 vn_file: libgcc_s.so.1 vn_cnt: 1
0x0050 vna_name: GCC_3.0 vna_flags: 0 vna_other: 4
0x0020 vn_version: 1 vn_file: libc.so.7 vn_cnt: 2
0x0060 vna_name: FBSD_1.0 vna_flags: 0 vna_other: 6
0x0070 vna_name: FBSD_1.7 vna_flags: 0 vna_other: 2
(here we see that the vn_versions are compacted at the start of the section (offsets 0x0, 0x10, 0x20), with the auxiliaries occupying the end of the section; and the vna_other field of the auxiliary section is the version name's ID in the .gnu.version section, which is appended to symbols expected in this library too in the .dynsym section)
Patching patchelf
With this PR's patch, we get our .gnu.version_r right:
Version needed section (.gnu.version_r):
0x0000 vn_version: 1 vn_file: libgcc_s.so.1 vn_cnt: 1
0x0040 vna_name: GCC_3.0 vna_flags: 0 vna_other: 4
0x0010 vn_version: 1 vn_file: libc.so.7 vn_cnt: 2
0x0050 vna_name: FBSD_1.0 vna_flags: 0 vna_other: 6
0x0060 vna_name: FBSD_1.7 vna_flags: 0 vna_other: 2
(note the offsets shifted by -0x10 due to the "first entry gets trimmed" rule, and offsets 0x20 and 0x30 not being reused, because the ghost auxiliary entries CXXABI_1.3 and GLIBCXX_3.4 still sit there).
The binary then works as expected which was our primary goal.
However…
The internal orphan problem
Although the binary runs well, when readelfing it, the missing .gnu.version_r entry disrupts resolving in both .gnu.version and .dynsym.
Both binutils's readelf and the stock readelf of FreeBSD will fail to find the version node's name.
But binutils will fail gracefully, making it appear as an unversioned / corrupted symbol:
[…]
6: […]
7: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __cxa_begin_catch@@<corrupt>
[…]
Version symbols section '.gnu.version' contains 26 entries:
Addr: 0x0000000000200578 Offset: 0x00000578 Link: 3 (.dynsym)
000: 0 (*local*) 2 (FBSD_1.7) 1 (*global*) 1 (*global*)
004: 1 (*global*) 1 (*global*) 1 (*global*) 3
008: 1 (*global*) 3 4 (GCC_3.0) 1 (*global*)
00c: 5 1 (*global*) 1 (*global*) 1 (*global*)
010: 1 (*global*) 1 (*global*) 6 (FBSD_1.0) 1 (*global*)
014: 1 (*global*) 1 (*global*) 1 (*global*) 1 (*global*)
018: 1 (*global*) 3
(note the @@<corrupt> in .dynsym, and the present-but-unnamed entries ID 3 in .gnu.version)
However FreeBSD's readelf will stop reading those sections as soon as the first error is encountered:
Symbol table '.dynsym' contains 26 entries:
[…]
6: […]
7: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __cxa_begin_catch
[and .dynsym dumping ends here]
Version symbol section (.gnu.version):
000: 0 *local* 2 FBSD_1.7 1 *global* 1 *global*
004: 1 *global* 1 *global* 1 *global*
[and .gnu.version dumping ends here]
[other sections continue without error]
(and it warns twice of "readelf: invalid versym version index 3", once per section it then stops dumping)
So…
Question for this PR
This partial test (limited to FreeBSD) shows that, at least on 1 platform, we can rely on the runtime loader's robustness to handle dangerously formed binaries (fallbacking to an unversioned symbol when not finding the version node's name?),
Are there use cases where this PR could prove useful as is (and particularly: on NixOS, or with standard Linux distribs)
or should the removal of the versioned library always un-version every related symbol too?
@Ma27 Any thoughts on this?