libxcrypt icon indicating copy to clipboard operation
libxcrypt copied to clipboard

lto linking, missing symbols

Open asavah opened this issue 7 years ago • 13 comments

gcc 8.2.0 binutils 2.31.1 glibc 2.28 (with --disable-crypt)

crosscompilation for aarch64, libxcrypt v4.1.1 configure --enable-shared --disable static

If libxcrypt is built with lto in *flags almost all symbols are missing from the library

lto build - bad

ls libcrypt.so.1.1.0 -ah
-rwxr-xr-x 1 asavah asavah 9496 Aug  5 00:46 libcrypt.so.1.1.0

nm -DC libcrypt.so.1.1.0
                 w __cxa_finalize
0000000000000000 A GLIBC_2.17
                 w __gmon_start__
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
0000000000000000 A OW_CRYPT_1.0
0000000000000000 A XCRYPT_2.0

Non lto build (gold linker) - good

ls libcrypt.so.1.1.0 -al
-rwxr-xr-x 1 asavah asavah 146864 Aug  5 00:49 libcrypt.so.1.1.0


nm -DC libcrypt.so.1
                 U __assert_fail
                 U close
000000000000a470 T crypt
000000000000a470 T crypt
0000000000008390 T crypt_gensalt
0000000000008390 T crypt_gensalt
0000000000008390 T crypt_gensalt
0000000000005530 T crypt_gensalt_ra
0000000000005530 T crypt_gensalt_ra
0000000000005530 T crypt_gensalt_ra
00000000000053b0 T crypt_gensalt_rn
00000000000053b0 T crypt_gensalt_rn
00000000000053b0 T crypt_gensalt_rn
0000000000005350 T crypt_r
0000000000005350 T crypt_r
0000000000005250 T crypt_ra
0000000000005250 T crypt_ra
00000000000051c0 T crypt_rn
00000000000051c0 T crypt_rn
                 w __cxa_finalize
000000000000aed0 T encrypt
000000000000aeb0 T encrypt_r
                 U __errno_location
                 U explicit_bzero
000000000000a470 T fcrypt
                 U free
                 U getentropy
                 U getrandom
0000000000000000 A GLIBC_2.17
                 w __gmon_start__
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
                 U malloc
                 U memcmp
                 U memcpy
                 U memset
                 U open
0000000000000000 A OW_CRYPT_1.0
                 U read
                 U realloc
000000000000aec0 T setkey
000000000000aea0 T setkey_r
                 U snprintf
                 U sprintf
                 U strlen
                 U strncmp
                 U strspn
                 U strtoul
                 U syscall
0000000000000000 A XCRYPT_2.0

Is this expected ?

asavah avatar Aug 04 '18 21:08 asavah

Long term known problem with LTO and LD versioning scripts and GCC [1]. Has been brought up with FUSE [2] last year, too. Clang has some issues with LTO too, but that's a different story.

Maybe this will work with GCC 9…

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48200 [2] https://github.com/libfuse/libfuse/issues/198

besser82 avatar Aug 05 '18 14:08 besser82

Thanks for the explanation, feel free to close, maybe it's worth mentioning in the README ?

asavah avatar Aug 05 '18 14:08 asavah

Maybe we should crash the configure script with a descriptive error, if -flto is detected in the CFLAGS or LDFLAGS… @zackw and @ldv-alt What do you think?

besser82 avatar Aug 05 '18 17:08 besser82

Given that -flto can probably be used along with some other options that mitigate this problem, I think a more fine-grained approach is needed.

ldv-alt avatar Aug 05 '18 17:08 ldv-alt

I looked briefly at what it would take to detect problematic CC/CFLAGS settings within configure, and it doesn't look easy. It is finicky to write autoconf tests involving multiple object files, but I've done that before. The main problem is, for an accurate check we would need to run libtool, but libtool doesn't even exist until config.status runs. If someone could clue me in how to run ltmain.sh from within configure and have it behave as if it were the generated libtool, I could take it the rest of the way, but I don't have time to figure out that part myself.

"make check" does detect the problem, at least. For now I think we should leave this bug open and mention it in the README.

zackw avatar Aug 06 '18 14:08 zackw

Documented the limitation in 6b06074. I'm still happy to write a configure check if someone tells me how to run libtool from inside configure.

zackw avatar Aug 29 '18 16:08 zackw

@asavah, @zackw FYI, building with -flto is working with Clang / LLVM >= 7.0.0 now.

Anyways, test-symbols-static and test-symbols-renames are failing, because nm does not support Clang's lto-object format out of the box.

besser82 avatar Dec 21 '18 16:12 besser82

@besser82 Neat. I wonder what they did to make it work. I'll maybe see if I can fix the tests.

zackw avatar Jan 24 '19 15:01 zackw

Re-opening as GCC 10.2 supports the needed initial bits now.

besser82 avatar Aug 13 '20 16:08 besser82

Here's a different perspective, which might or might not be helpful:

LTO feels primarily like an unjustified risk to me: exposing potential cross-translation-unit instances of undefined behavior, making them actually manifest themselves, sometimes as security vulnerabilities. Do we really want that? What for? Instead of supporting LTO, how about explicitly not supporting it - making attempted builds with LTO fail and not produce output files? Document that libxcrypt must not be built with LTO.

Maybe add a flag to "support" and enable build with LTO for stress-testing only (not production), so that we get a chance to detect and fix some of those instances of UB ourselves before someone possibly reuses the code somewhere with LTO or merges source files. Somehow ensure that build wouldn't be inadvertently usable in production.

BTW, I recall @besser82 also wanted to support build with auto-merging of source files, which would have similar bug-exposing and security implications, and I similarly advised against that except maybe as a stress-test.

I'm no LTO expert, and am not up to date on whether it's somehow possibly required somewhere in the ecosystem now. Please correct me if so.

solardiz avatar Aug 14 '20 15:08 solardiz

Accidentally closed this.

besser82 avatar Mar 31 '21 06:03 besser82

It's not clear to me what still needs to be done here. Off the top of my head, some or all of these may be needed:

  • [ ] Verify that, when a compiler that supports __attribute__((symver)) is in use, a shared library build with -flto exposes the same set of versioned symbols, including all compatibility aliases and compatibility-only symbols, as a build without that option, for all values of all configure options that affect what symbols are exposed. Test both GCC and Clang.
  • [ ] Find and eradicate all instances of undefined behavior that are promoted from latent to manifest bugs by -flto. The test suite should be thorough enough to catch most of them for valid input; I'm not sure about invalid input. A crude but effective way to determine whether the compiler thinks the code might have UB is to compile with -fsanitize=undefined and then look through the machine code for calls to the UBSan runtime. Watch out for compiler bugs exposed by -flto. Testing on at least two non-x86 CPU architectures will be needed.
  • [ ] Document the supported compiler versions for use of -flto. (#176 makes a start on this.)
  • [ ] Add -flto to the CI test matrix.
  • [ ] Document whether we think it is a good idea to use -flto. I am currently of the opinion that it is not a good idea, even after all the above testing, basically because I doubt we can be sure enough that we got all the UB (@solardiz said similar things above: https://github.com/besser82/libxcrypt/issues/24#issuecomment-674139907) and I've heard a number of horror stories about LTO exposing compiler bugs. I am open to persuasion.

What did I miss?

zackw avatar Mar 26 '24 18:03 zackw

I mistakenly thought PR #109 had already been merged. Further discussion should probably take place there.

zackw avatar Mar 26 '24 18:03 zackw