msquic icon indicating copy to clipboard operation
msquic copied to clipboard

OpenSSL 3.5 crash

Open ManickaP opened this issue 5 months ago • 19 comments

Trying to use msquic (main) with the new OpenSSL 3.5+ TLS backend to test with our .NET System.Net.Quic tests, it immediately crashes with:

/home/manicka/repositories/runtime.2/artifacts/bin/testhost/net10.0-linux-Debug-x64/dotnet: symbol lookup error: /usr/lib/libmsquic.so.2: undefined symbol: ossl_time_now

Environment details:

OS: Arch Linux 6.15.2-arch1-1
OpenSSL 3.5.0 8 Apr 2025 (Library: OpenSSL 3.5.0 8 Apr 2025)
MsQuic: build from main c69379c989ff72b088ce1014f2e2587a90d58b87

EDIT: no core dump file is created when this happens

ManickaP avatar Jul 01 '25 07:07 ManickaP

@nhorman Could this be due to a mismatch between the OpenSSL version shipping on the OS and the version MsQuic is building against?

anrossi avatar Jul 01 '25 16:07 anrossi

Thats.... a odd one. I would think that, yes, it at least in part has to do with the differences between the msquic-shipping openssl and the system installed instance, but I don't think its expressly a versioning issue.

ossl_time_now is definitely a openssl function, its been in place since commit d6bfdf6789f65b1b503f0cdd56010705f7c632d0 (circa 2022), so it would exist on any recent (3.2 or later I think) release of openssl.

Heres the weird thing, its an internal function. I.e.:

objdump -t ./libmsquic.so | grep ossl_time_now
0000000000126130 l     F .text	000000000000009e              ossl_time_now

Its a local symbol, not meant to be exposed to any outside user, and the above objdump command seems to confirm its local scope, at least when I build here, so nothing outside of openssl should be attempting to use it, or even really look it up, beyond elf file interrogation for the purposes of debugging.

FWIW, building with quictls doesn't show this symbol as present, as its forked at 3.1.7, which didn't contain the function at all. Though, that said, both openssl and quictls builds of libmsquic show lots of function symbols with local visibility, representing code in both libraries which are internal use only.

That all said, looking at the libcrypto.so installed as the system library on my Fedora 42 workstation, I see this:

objdump -t /usr/lib64/libcrypto.so.3

/usr/lib64/libcrypto.so.3:     file format elf64-x86-64

SYMBOL TABLE:
no symbols

The Fedora developers, as part of their packaging process strip the symbol table from libraries into separate debuginfo packages, so if in the event (and I'll prefix this by saying I have no idea what internal magic dotnet does), attempts to preform something like a dlsym() lookup on ossl_time_now, or any other local symbol, and uses the system libcrypto in this environment rather than libmsquic, I would expect that to fail.

What I'm most curious about is: Why is something in dotnet attempting to resolve a symbol that has local scope in the first place. That doesn't make sense to me.

@ManickaP can you run objdump -t on your built libmsquic library and attempt to find the ossl_time_now symbol, confirming its local visibiilty? Lets at least make sure that your build results match what I'm seeing here.

After that I would suggest that you run your reproducer command line with the LD_DEBUG=all environment variable set and post the output here? Lets see if we can trace the loading of all your objects and maybe catch dotnet in the act of either looking up a symbol where it shouldn't or otherwise trying to resolve a symbol that it needent. It might also be useful to do an identical test with quictls so we can see how that loading process deals with local visibility symbols.

Right now my thinking is that, given that quictls works and openssl doesn't (I think thats the case, yes @ManickaP?), given that both have lots of internal/local symbols, that there is something unique about ossl_time_now that your environment is tripping over, the above should give us some visibility as to whats going on.

nhorman avatar Jul 01 '25 17:07 nhorman

I think this might be a problem 😄

objdump -t /usr/lib/libmsquic.so.2 | grep ossl_time_now
0000000000000000         *UND*	0000000000000000              ossl_time_now

It's possible that I'm not building the libmsquic correctly. I'm using these options:

-DCMAKE_BUILD_TYPE=Debug -DQUIC_BUILD_TOOLS=off -DQUIC_BUILD_TEST=off -DQUIC_BUILD_PERF=off -DQUIC_ENABLE_LOGGING=true -DQUIC_USE_SYSTEM_LIBCRYPTO=true -DQUIC_TLS_LIB=openssl

ManickaP avatar Jul 02 '25 09:07 ManickaP

that is interesting.

If I build like you are, using cmake directly:

nhorman@hmsbeagle:~/git/msquic$ mkdir build
nhorman@hmsbeagle:~/git/msquic$ cd build
nhorman@hmsbeagle:~/git/msquic/build$ cmake -DCMAKE_BUILD_TYPE=Debug -DQUIC_BUILD_TOOLS=off -DQUIC_BUILD_TEST=off -DQUIC_BUILD_PERF=off -DQUIC_ENABLE_LOGGING=true -DQUIC_USE_SYSTEM_LIBCRYPTO=true -DQUIC_TLS_LIB=openssl ..
...
make

I get the same results you do:

nhorman@hmsbeagle:~/git/msquic/build$ objdump -t ./bin/Debug/libmsquic.so.2 | grep ossl_time_now
0000000000000000         *UND*	0000000000000000              ossl_time_now``````

which is super weird, as ossl_time_now, as noted previously isn't an exported symbol. Whats even more odd is that, using this build strategy, libmsquic links dynamically to libcrypto in openssl, but not to libssl

nhorman@hmsbeagle:~/git/msquic/build$ ldd ./bin/Debug/libmsquic.so.2
	linux-vdso.so.1 (0x00007f6cda302000)
	libcrypto.so.3 => /lib64/libcrypto.so.3 (0x00007f6cd9c00000)
	libatomic.so.1 => /lib64/libatomic.so.1 (0x00007f6cda1c5000)
	libnuma.so.1 => /lib64/libnuma.so.1 (0x00007f6cda1b7000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f6cd9a0e000)
	libz.so.1 => /lib64/libz.so.1 (0x00007f6cda194000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f6cda304000)

Compare that to building with the powershell scripts, the way it does in CI:

nhorman@hmsbeagle:~/git/msquic$ pwsh
PowerShell 7.5.0

   A new PowerShell stable release is available: v7.5.2 
   Upgrade now, or check out the release page at:       
     https://aka.ms/PowerShell-Release?tag=v7.5.2       

PS /home/nhorman/git/msquic> ./scripts/build.ps1 -Config Release -Platform linux -Arch x64 -Tls openssl
...

In which the resultant libmsquic links statically to both libcrypto and libssl:

PS /home/nhorman/git/msquic> ldd ./artifacts/bin/linux/x64_Release_openssl/libmsquic.so
	linux-vdso.so.1 (0x00007f31a6067000)
	libatomic.so.1 => /lib64/libatomic.so.1 (0x00007f31a602e000)
	libnuma.so.1 => /lib64/libnuma.so.1 (0x00007f31a6020000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f31a580e000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f31a6069000)
PS /home/nhorman/git/msquic> 

And as a result ossl_time_now is now defined:

PS /home/nhorman/git/msquic> objdump -t ./artifacts/bin/linux/x64_Release_openssl/libmsquic.so | grep ossl_time_now                            0000000000126130 l     F .text  000000000000009e              ossl_time_now

So I think whats happening here is that there is a discrepancy between the cmake build environment and the powershell target.

I think what were seeing is that, because libssl is always linked statically to libmsquic, it (libssl) references ossl_time_now. When linked statically it (libssl.a) references ossl_time_now, expecting that libcrypto.a will also be linked statically, which is where ossl_time_now is defined, and available for static linking.

However, when libssl is linked statically, but libcrypto is linked dynamically, the reference remains undefined, but because its not exported via our linker script (nor should it be), we get an undefined symbol thats just never going to get resolved at run time by the dynamic loader.

In fact, I think I see where its happening. Fishing through the build directory, I find ./src/bin/CMakeFiles/msquic.dir/build.make generated by cmake in which we see:

msquic_EXTERNAL_OBJECTS =

bin/Debug/libmsquic.so.2.6.0: src/bin/CMakeFiles/msquic.dir/linux/init.c.o
bin/Debug/libmsquic.so.2.6.0: src/bin/CMakeFiles/msquic.dir/build.make
bin/Debug/libmsquic.so.2.6.0: src/bin/CMakeFiles/msquic.dir/compiler_depend.ts
bin/Debug/libmsquic.so.2.6.0: obj/Debug/libcore.a
bin/Debug/libmsquic.so.2.6.0: obj/Debug/libmsquic_platform.a
bin/Debug/libmsquic.so.2.6.0: _deps/opensslquic-build/openssl/lib/libssl.a
bin/Debug/libmsquic.so.2.6.0: /usr/lib64/libcrypto.so

Note the libssl.a archive and the (system!) provided libcrypto.so is getting pulled in as dependency objects

@anrossi I think the question is to you here, since I'm still not super familiar with the internal plumbing of the msquic build system. Do you happen to know from which cmake file this dependency tree is generated? Part of me is wondering if this isn't some weird problem with cmakes FindOpenSSL.cmake module.

nhorman avatar Jul 02 '25 11:07 nhorman

I'm no expert on native builds, so I have no comments on the analysis 😄

But I wanted to say as a side-note, that we cannot ship with statically linked crypto. As that would make us (as MSFT) responsible for security patches.

ManickaP avatar Jul 02 '25 12:07 ManickaP

I can't comment on your security process, but I can confirm that all of the artifacts that you build in CI, as far as I can tell statically link to both libcrytpo and libssl from the openssl repository.

nhorman avatar Jul 02 '25 12:07 nhorman

-DQUIC_USE_SYSTEM_LIBCRYPTO=true is the key here most likely. It makes msquic static link libssl, but dynamic link libcrypto. But that shouldn't be needed for openssl proper (not quictls fork) as the whole openssl can now be dynamic linked. But QUIC_USE_SYSTEM_LIBCRYPTO might need to be updated to dynamic link everything if using openssl.

nibanks avatar Jul 02 '25 13:07 nibanks

as far as I can tell statically link to both libcrytpo and libssl from the openssl repository.

No, QUIC_USE_SYSTEM_LIBCRYPTO makes msquic statically link only ssl, but crypto is dynamically linked from the system.

But QUIC_USE_SYSTEM_LIBCRYPTO might need to be updated to dynamic link everything if using openssl.

I think this is what we need.

BTW, I tried removing QUIC_USE_SYSTEM_LIBCRYPTO from my build and I get a bunch of other errors. I'll investigate it more, to make sure it's not .NET fault, and file a separate issue for it.

ManickaP avatar Jul 02 '25 13:07 ManickaP

@ManickaP do you know how the CMakeLists.txt for submodules/ works? I'm looking at it, and it appears to me that when USE_SYSTEM_LIBCRYPTO is set, both libcrypto and libssl are getting statically linked, so clearly I'm missing something.

nhorman avatar Jul 30 '25 19:07 nhorman

I barely understand anything CMake related, but the difference seems to be here: https://github.com/microsoft/msquic/blob/2a4199703a7c1921915c4c6373cc9a2a0b0b9e2e/submodules/CMakeLists.txt#L351 linking to a system path of crypto, versus: https://github.com/microsoft/msquic/blob/2a4199703a7c1921915c4c6373cc9a2a0b0b9e2e/submodules/CMakeLists.txt#L359-L362 linking to a locally built crypto from the submodule.

@wfurt will know more about this though.

IMHO I think that for OpenSSL 3.5 you shouldn't even need/have a submodule. You should be building with the system installed OpenSSL 3.5, both SSL and Crypto part. And I don't think MsQuic needs to keep the static linking at all for this case. But that's more for a discussion with @anrossi and MsQuic team.

ManickaP avatar Jul 31 '25 08:07 ManickaP

@ManickaP but the system installed openssl libraries may not be version 3.5 or later. In fact they almost certainly will not be. Most distributions still ship openssl-3.0 or 3.2

As for the cmake issue, that makes sense. I suppose the right-ish approach would be to modify the second case to link OpenSSL::Crypto so we get dynamic linkage in both cases

nhorman avatar Jul 31 '25 15:07 nhorman

@ManickaP can you please try your tests with this patch:

diff --git a/submodules/CMakeLists.txt b/submodules/CMakeLists.txt
index b6d074d68..ba1c46c0c 100644
--- a/submodules/CMakeLists.txt
+++ b/submodules/CMakeLists.txt
@@ -333,7 +333,8 @@ else()
     target_link_libraries(
         OpenSSLQuic
         INTERFACE
-        ${LIBSSL_PATH}
+        OpenSSL::Crypto
+        OpenSSL::SSL
     )
 
     if (QUIC_USE_SYSTEM_LIBCRYPTO)
@@ -348,7 +349,7 @@ else()
             if (OPENSSL_VERSION VERSION_EQUAL EXPECTED_OPENSSL_VERSION OR OPENSSL_MAJORMINOR VERSION_EQUAL EXPECTED_OPENSSL_VERSION OR
                 # 3.1 is compatible with 3.0, 3.2 and beyond maybe as well.
                 (EXPECTED_OPENSSL_VERSION VERSION_EQUAL "3.0" AND OPENSSL_MAJOR EQUAL "3"))
-                target_link_libraries(OpenSSLQuic INTERFACE OpenSSL::Crypto)
+                target_link_libraries(OpenSSLQuic INTERFACE OpenSSL::Crypto OpenSSL::SSL)
             else()
                 message(FATAL_ERROR "OpenSSL ${EXPECTED_OPENSSL_VERSION} not found, found ${OPENSSL_VERSION}")
             endif()
@@ -359,7 +360,7 @@ else()
         target_link_libraries(
             OpenSSLQuic
             INTERFACE
-            ${LIBCRYPTO_PATH}
+            OpenSSL::Crypto OpenSSL::SSL
         )
     endif()

This isn't at all correct, as it breaks the powershell build, but it does fix the direct cmake build, and results in libcrypto and libssl getting dynamically linked to libmsquic.so.2, so it would be good to confirm it at least resolves your problem, letting us know we're on the right path. Note, with the new dynamic linking, you will likely need to use LD_LIBRARY_PATH, or an rpath option in the library to ensure the right libcrypto/ssl get loaded at run time.

nhorman avatar Jul 31 '25 16:07 nhorman

I assume the test runs on the same machine where it was built @ManickaP? With that, it should work with or without system crypto. If this is not the case, things may get more tricky. We run into some issue with missing symbols while back with @nibanks but it was never fully resolved.

wfurt avatar Jul 31 '25 16:07 wfurt

but the system installed openssl libraries may not be version 3.5 or later.

The build against OpenSSL 3.5 should be done on a system with OpenSSL 3.5 and only for systems with OpenSSL 3.5. For the rest, we have to keep the QuicTLS build and use that. Using MsQuic with statically built-in crypto is a deal breaker and not just for us. And AFAIK all of the published linux packages use system crypto.

ManickaP avatar Aug 01 '25 08:08 ManickaP

can you please try your tests with this patch:

It works, no need to set LD_LIBRARY_PATH, I already have OpenSSL 3.5 on my system.

ManickaP avatar Aug 01 '25 09:08 ManickaP

Ok, thank you, thats good news, just need to figure out why the powershell build is breaking then. I'm sure I just did something stupid

nhorman avatar Aug 01 '25 12:08 nhorman

I've created a PR for this issue, which address the static/dynamic mismatch when using USE_SYSTEM_LIBCRYPTO in the cmake build system. I've opted to leave the powershell build setup alone for now. I know there was a comment that openssl should always be dynamically linked, as per ms security policy, but making that change is going to be a significant additional undertaking (i.e. we need to come up with a way to link dynamically that refers to the local submodule instead of using cmakes approach relying on the locally installed openssl pkgconfig file to find the library path, and then adjust all the test to set LD_LIBRARY_PATH accordingly (or the windows equivalent), so the right libraries are found at run time for testing). It seems like something that should be done in a separate PR

nhorman avatar Aug 01 '25 12:08 nhorman

Thanks for the PR.

The challenge we are facing here is that:

  • we want to link dynamically to the system libcrypto
    • to avoid being on the upgrade path in case of security issues
    • because of FIPS certifications
  • we link statically to libssl because most system OpenSSL won't support QUIC yet

Fully dynamically linking will be a great solution once OpenSSL 3.5 is available in most distributions. We will need a solution in the meantime, I'll discuss with @anrossi .

guhetier avatar Aug 01 '25 16:08 guhetier

@guhetier ostensibly the solution here I think (until such time as a sufficiently recent version of openssl is available in a os distrubuted platform packet) would be to:

  1. build libcrypto/libssl as dynamically shared objects
  2. install them in a subdirectory of the msquic installation
  3. run time bind the libcrypto/libssl DSO's to libmsquic

Step 3 on linux is likely a matter of using something like -Wl,--rpath-/path/to/dsos to have the loader look in the right place for the needed libs. Mac OSX would likely be a similar solution. Unsure what the windows solution here is (maybe the APP_PATH registry key, or just co-locating the DLL's, unsure)

nhorman avatar Aug 01 '25 16:08 nhorman