libxcrypt
libxcrypt copied to clipboard
lto linking, missing symbols
gcc 8.2.0 binutils 2.31.1 glibc 2.28 (with --disable-crypt)
crosscompilation for aarch64, libxcrypt v4.1.1 configure --enable-shared --disable static
If libxcrypt is built with lto in *flags almost all symbols are missing from the library
lto build - bad
ls libcrypt.so.1.1.0 -ah
-rwxr-xr-x 1 asavah asavah 9496 Aug 5 00:46 libcrypt.so.1.1.0
nm -DC libcrypt.so.1.1.0
w __cxa_finalize
0000000000000000 A GLIBC_2.17
w __gmon_start__
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
0000000000000000 A OW_CRYPT_1.0
0000000000000000 A XCRYPT_2.0
Non lto build (gold linker) - good
ls libcrypt.so.1.1.0 -al
-rwxr-xr-x 1 asavah asavah 146864 Aug 5 00:49 libcrypt.so.1.1.0
nm -DC libcrypt.so.1
U __assert_fail
U close
000000000000a470 T crypt
000000000000a470 T crypt
0000000000008390 T crypt_gensalt
0000000000008390 T crypt_gensalt
0000000000008390 T crypt_gensalt
0000000000005530 T crypt_gensalt_ra
0000000000005530 T crypt_gensalt_ra
0000000000005530 T crypt_gensalt_ra
00000000000053b0 T crypt_gensalt_rn
00000000000053b0 T crypt_gensalt_rn
00000000000053b0 T crypt_gensalt_rn
0000000000005350 T crypt_r
0000000000005350 T crypt_r
0000000000005250 T crypt_ra
0000000000005250 T crypt_ra
00000000000051c0 T crypt_rn
00000000000051c0 T crypt_rn
w __cxa_finalize
000000000000aed0 T encrypt
000000000000aeb0 T encrypt_r
U __errno_location
U explicit_bzero
000000000000a470 T fcrypt
U free
U getentropy
U getrandom
0000000000000000 A GLIBC_2.17
w __gmon_start__
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
U malloc
U memcmp
U memcpy
U memset
U open
0000000000000000 A OW_CRYPT_1.0
U read
U realloc
000000000000aec0 T setkey
000000000000aea0 T setkey_r
U snprintf
U sprintf
U strlen
U strncmp
U strspn
U strtoul
U syscall
0000000000000000 A XCRYPT_2.0
Is this expected ?
Long term known problem with LTO and LD versioning scripts and GCC [1]. Has been brought up with FUSE [2] last year, too. Clang has some issues with LTO too, but that's a different story.
Maybe this will work with GCC 9…
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48200 [2] https://github.com/libfuse/libfuse/issues/198
Thanks for the explanation, feel free to close, maybe it's worth mentioning in the README ?
Maybe we should crash the configure script with a descriptive error, if -flto is detected in the CFLAGS or LDFLAGS… @zackw and @ldv-alt What do you think?
Given that -flto can probably be used along with some other options that mitigate this problem, I think a more fine-grained approach is needed.
I looked briefly at what it would take to detect problematic CC/CFLAGS settings within configure, and it doesn't look easy. It is finicky to write autoconf tests involving multiple object files, but I've done that before. The main problem is, for an accurate check we would need to run libtool, but libtool doesn't even exist until config.status runs. If someone could clue me in how to run ltmain.sh from within configure and have it behave as if it were the generated libtool, I could take it the rest of the way, but I don't have time to figure out that part myself.
"make check" does detect the problem, at least. For now I think we should leave this bug open and mention it in the README.
Documented the limitation in 6b06074. I'm still happy to write a configure check if someone tells me how to run libtool from inside configure.
@asavah, @zackw FYI, building with -flto is working with Clang / LLVM >= 7.0.0 now.
Anyways, test-symbols-static and test-symbols-renames are failing, because nm does not support Clang's lto-object format out of the box.
@besser82 Neat. I wonder what they did to make it work. I'll maybe see if I can fix the tests.
Re-opening as GCC 10.2 supports the needed initial bits now.
Here's a different perspective, which might or might not be helpful:
LTO feels primarily like an unjustified risk to me: exposing potential cross-translation-unit instances of undefined behavior, making them actually manifest themselves, sometimes as security vulnerabilities. Do we really want that? What for? Instead of supporting LTO, how about explicitly not supporting it - making attempted builds with LTO fail and not produce output files? Document that libxcrypt must not be built with LTO.
Maybe add a flag to "support" and enable build with LTO for stress-testing only (not production), so that we get a chance to detect and fix some of those instances of UB ourselves before someone possibly reuses the code somewhere with LTO or merges source files. Somehow ensure that build wouldn't be inadvertently usable in production.
BTW, I recall @besser82 also wanted to support build with auto-merging of source files, which would have similar bug-exposing and security implications, and I similarly advised against that except maybe as a stress-test.
I'm no LTO expert, and am not up to date on whether it's somehow possibly required somewhere in the ecosystem now. Please correct me if so.
Accidentally closed this.
It's not clear to me what still needs to be done here. Off the top of my head, some or all of these may be needed:
- [ ] Verify that, when a compiler that supports
__attribute__((symver))is in use, a shared library build with-fltoexposes the same set of versioned symbols, including all compatibility aliases and compatibility-only symbols, as a build without that option, for all values of all configure options that affect what symbols are exposed. Test both GCC and Clang. - [ ] Find and eradicate all instances of undefined behavior that are promoted from latent to manifest bugs by
-flto. The test suite should be thorough enough to catch most of them for valid input; I'm not sure about invalid input. A crude but effective way to determine whether the compiler thinks the code might have UB is to compile with-fsanitize=undefinedand then look through the machine code for calls to the UBSan runtime. Watch out for compiler bugs exposed by-flto. Testing on at least two non-x86 CPU architectures will be needed. - [ ] Document the supported compiler versions for use of
-flto. (#176 makes a start on this.) - [ ] Add
-fltoto the CI test matrix. - [ ] Document whether we think it is a good idea to use
-flto. I am currently of the opinion that it is not a good idea, even after all the above testing, basically because I doubt we can be sure enough that we got all the UB (@solardiz said similar things above: https://github.com/besser82/libxcrypt/issues/24#issuecomment-674139907) and I've heard a number of horror stories about LTO exposing compiler bugs. I am open to persuasion.
What did I miss?
I mistakenly thought PR #109 had already been merged. Further discussion should probably take place there.