liboqs
liboqs copied to clipboard
Fix memory leak when using OpenSSL and threads
'run_tests' memory leak tests fails when build options enables OQS_USE_PTHREADS and OQS_USE_OPENSSL.
valgrind tests/test_sig SPHINCS+-SHA2-256s-simple
==158054== Memcheck, a memory error detector
==158054== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==158054== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==158054== Command: tests/test_sig SPHINCS+-SHA2-256s-simple
==158054==
Testing signature algorithms using liboqs version 0.10.2-dev
Configuration info
==================
Target platform: x86_64-Linux-5.4.0-126-generic
Compiler: gcc (9.4.0)
Compile options: [-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version: 0.10.2-dev
Git commit: f47817d8cb0cd621dada5276f30f99934f3132a9
OpenSSL enabled: Yes (OpenSSL 3.3.3.8.3.0-dev )
AES: NI
SHA-2: OpenSSL
SHA-3: OpenSSL
OQS build flags: OQS_DIST_BUILD OQS_LIBJADE_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Release
CPU exts active: AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3
================================================================================
Sample computation for signature SPHINCS+-SHA2-256s-simple
================================================================================
verification passes as expected
==158054==
==158054== HEAP SUMMARY:
==158054== in use at exit: 240 bytes in 1 blocks
==158054== total heap usage: 7,654 allocs, 7,653 frees, 844,712 bytes allocated
==158054==
==158054== LEAK SUMMARY:
**==158054== definitely lost: 240 bytes in 1 blocks**
==158054== indirectly lost: 0 bytes in 0 blocks
==158054== possibly lost: 0 bytes in 0 blocks
==158054== still reachable: 0 bytes in 0 blocks
==158054== suppressed: 0 bytes in 0 blocks
==158054== Rerun with --leak-check=full to see details of leaked memory
==158054==
==158054== For lists of detected and suppressed errors, rerun with: -s
==158054== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Fixes #1941.
- [ ] Does this PR change the input/output behaviour of a cryptographic algorithm (i.e., does it change known answer test values)? (If so, a version bump will be required from x.y.z to x.(y+1).0.)
- [ ] Does this PR change the list of algorithms available -- either adding, removing, or renaming? Does this PR otherwise change an API? (If so, PRs in fully supported downstream projects dependent on these, i.e., oqs-provider will also need to be ready for review and merge by the time this is merged.)
This is present in the 0.11.0 release, I guess? Is it critical to fix?
Do we have a CI job that tests this configuration?
This is present in the 0.11.0 release, I guess? Is it critical to fix?
Do we have a CI job that tests this configuration?
For what it's worth, OQS_USE_PTHREADS is not intended to be exposed to the user as a config variable; it's set automatically based on the availability of pthreads and not documented in CONFIGURE.md. I would guess that all of our Linux and macOS CI systems have pthreads, so we are probably not running any tests without pthreads enabled—which makes me wonder why we haven't previously detected this leak. I'm hoping that maybe this leak only arises when OQS_USE_PTHREADS is set manually instead of being autoset by CMake.
I'm not able to reproduce the leak myself—@ashman-p could you let me know what Linux environment you're running so I can test various configs out in Docker images?
I'm not able to reproduce the leak myself—@ashman-p could you let me know what Linux environment you're running so I can test various configs out in Docker images?
@SWilson4 , Thanks for checking on this issue. I just have a generic linux setup. Linux nashley-dev 5.4.0-126-generic #142-Ubuntu SMP Fri Aug 26 12:12:57 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
liboqs Build
cmake -DBUILD_SHARED_LIBS=ON -DOQS_USE_OPENSSL=ON -DCMAKE_BUILD_TYPE=Debug -GNinja ..
-- The C compiler identification is GNU 9.4.0
-- The ASM compiler identification is GNU
-- Found assembler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Performing Test CC_SUPPORTS_WA_NOEXECSTACK
-- Performing Test CC_SUPPORTS_WA_NOEXECSTACK - Success
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- Algorithms filtered for KEM_ml_kem_512;KEM_ml_kem_768;KEM_ml_kem_1024;KEM_ml_kem_512_ipd;KEM_ml_kem_768_ipd;KEM_ml_kem_1024_ipd;SIG_ml_dsa_44;SIG_ml_dsa_65;SIG_ml_dsa_87;SIG_sphincs_sha2_128f_simple;SIG_sphincs_sha2_128s_simple;SIG_sphincs_sha2_256f_simple;SIG_sphincs_sha2_256s_simple;SIG_sphincs_shake_128f_simple;SIG_sphincs_shake_128s_simple;SIG_sphincs_shake_256f_simple
-- Found OpenSSL: /home/ubuntu/openssl3.3/ossl-install/lib/libcrypto.so (Required is at least version "1.1.1")
-- Looking for aligned_alloc
-- Looking for aligned_alloc - found
-- Looking for posix_memalign
-- Looking for posix_memalign - found
-- Looking for memalign
-- Looking for memalign - found
-- Looking for explicit_bzero
-- Looking for explicit_bzero - found
-- Looking for explicit_memset
-- Looking for explicit_memset - not found
-- Looking for memset_s
-- Looking for memset_s - not found
-- Found Doxygen: /usr/bin/doxygen (found version "1.8.17") found components: doxygen dot
-- Configuring done
-- Generating done
@SWilson4, the problem occurs when use of OQS_USE_SHA3_OPENSSL=ON and threading is active. The only significance of the use of threads is that the problem went away when threading was disabled.
OK, I've managed to reproduce the leak, but only when building against OpenSSL >= 3.3.2. In particular, the leak does not occur when building against OpenSSL 3.3.1 with the same configuration.
It seems to me that this might actually be an OpenSSL bug rather than a liboqs bug, especially as the fix proposed here should (?) be unnecessary: per the OpenSSL docs:
As of version 1.1.0 OpenSSL will automatically allocate all resources that it needs so no explicit initialisation is required. Similarly it will also automatically deinitialise as required.
Looping in @baentsch @romen @beldmit @levitte: any knowledge of a regression going from 3.3.1 to 3.3.2?
I will try to isolate the exact commit which introduces the leak.
At first glance nothing suspicious
I will try to isolate the exact commit which introduces the leak.
It seems that this leak was introduced in https://github.com/openssl/openssl/commit/83efcfdfa1de760bd30df7f4cf94e7a0d2b0db9f. When building against the parent of that commit (https://github.com/openssl/openssl/commit/a13df68796828794920403c31d77409b0f06bae0) the leak does not occur.
^ fyi @beldmit
@nhorman could you please take a look?
FWIW, i don't see anything expressly wrong with the changes, its fine to call OPENSSL_init_crypto in your application rather than having the library do it as needed, but I'm having a hard time understanding:
- Where exactly the leak is
- How manually calling OpenSSL_init_crypto fixes it
I'll try reproduce here, but since you already have it set up, can you re-run valgrind with --leak-check=full to show the stack trace of where the leaked memory is being allocated?
Sure, here's the valgrind output:
==462229==
==462229== HEAP SUMMARY:
==462229== in use at exit: 240 bytes in 1 blocks
==462229== total heap usage: 6,841 allocs, 6,840 frees, 845,217 bytes allocated
==462229==
==462229== 240 bytes in 1 blocks are definitely lost in loss record 1 of 1
==462229== at 0x4846828: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==462229== by 0x4AEC3C3: CRYPTO_malloc (mem.c:202)
==462229== by 0x4AEC44D: CRYPTO_zalloc (mem.c:222)
==462229== by 0x4B06246: ossl_rcu_read_lock (threads_pthread.c:410)
==462229== by 0x49D7FB8: module_find (conf_mod.c:403)
==462229== by 0x49D7A9D: module_run (conf_mod.c:259)
==462229== by 0x49D782F: CONF_modules_load (conf_mod.c:166)
==462229== by 0x49D796A: CONF_modules_load_file_ex (conf_mod.c:215)
==462229== by 0x49D8CB9: ossl_config_int (conf_sap.c:68)
==462229== by 0x4AEAF13: ossl_init_config (init.c:258)
==462229== by 0x4AEAEF5: ossl_init_config_ossl_ (init.c:256)
==462229== by 0x4F31EC2: __pthread_once_slow (pthread_once.c:116)
==462229==
==462229== LEAK SUMMARY:
==462229== definitely lost: 240 bytes in 1 blocks
==462229== indirectly lost: 0 bytes in 0 blocks
==462229== possibly lost: 0 bytes in 0 blocks
==462229== still reachable: 0 bytes in 0 blocks
==462229== suppressed: 0 bytes in 0 blocks
==462229==
==462229== For lists of detected and suppressed errors, rerun with: -s
==462229== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
OpenSSL is built with
cd ~/openssl
git checkout 83efcfdfa1
./config --debug --prefix=`pwd`/../.localopenssl_83efcfdfa1_debug && make -j 4 && make install_sw install_ssldirs
liboqs is built with
cd ~/liboqs
git checkout 7f4c89b2
mkdir build && cd build
cmake -GNinja -DOPENSSL_ROOT_DIR=../.localopenssl_83efcfdfa1_debug -DOQS_MINIMAL_BUILD="SIG_ml_dsa_65" -DOQS_USE_SHA3_OPENSSL=ON -DCMAKE_BUILD_TYPE=Debug ..
ninja
The triggering command is
valgrind --leak-check=full ~/liboqs/build/tests/test_sig ML-DSA-65
Thank you. Thats interesting, I just tried to recreate with the head of the master branch, and didn't encounter the leak. Given that you reported the issue was introduced with commit 83efcfd (which is a backport of the original commit to the 3.3 branch, that makes me think we fixed something in master that we didn't backport. Let me see if I can find it
Thank you. Thats interesting, I just tried to recreate with the head of the master branch, and didn't encounter the leak. Given that you reported the issue was introduced with commit 83efcfd (which is a backport of the original commit to the 3.3 branch, that makes me think we fixed something in master that we didn't backport. Let me see if I can find it
Weird—I'm also getting the leak on the latest master (https://github.com/openssl/openssl/commit/c262cc0c0444f617387adac3ed4cad9f05f9c526).
hmm, what version of valgrind are you using? I'm on valgrind-3.23.0
also, what does your openssl.conf look like? Its possible I'm not loading any modules, and as such not triggering the allocation thats leaking
In fact, I'm sure its my config, I just ran it through gdb and never hit a breakpoint in ossl_rcu_read_lock
There we go, if I load the oqs provider (duh, should have thought of that), I see the leak. Now to figure out why
I'm on valgrind-3.22.0, and I haven't made any manual changes to openssl.cnf (including to load any provider).
I'm running everything in a Docker container on x86_64 / Linux; I'll attach a Dockerfile to duplicate my setup in case it's helpful to you.
FROM openquantumsafe/ci-ubuntu-latest:latest
WORKDIR /root
RUN git clone --depth=1 --branch=0.11.0 https://github.com/open-quantum-safe/liboqs.git liboqs
RUN git clone --branch=master https://github.com/openssl/openssl.git openssl
WORKDIR /root/openssl
RUN ./config --debug --prefix=/root/.localopenssl && make -j && make install_sw install_ssldirs
WORKDIR /root/liboqs
RUN mkdir build
WORKDIR ./build
RUN cmake -GNinja -DOPENSSL_ROOT_DIR=../.localopenssl -DOQS_MINIMAL_BUILD="SIG_ml_dsa_65" -DOQS_USE_SHA3_OPENSSL=ON -DCMAKE_BUILD_TYPE=Debug ..
RUN ninja
Ok, think I found the problem. https://github.com/openssl/openssl/commit/83efcfdfa1de760bd30df7f4cf94e7a0d2b0db9f added a use of ossl_init_thread_start to the code base, for which there is only a handful of users. ossl_init_thread_start registers a handler function to clean up thread local data, not when the thread exits explicitly, but rather when the library context against which the thread allocates resources is deleted. I'm not 100% sure why we do this, but its how it works.
Normally its not a problem. Library contexts get deleted in OSSL_LIB_CTX_free, that calls context_deinit->ossl_ctx_thread_stop->init_thread_stop, and that calls all the handlers registered via ossl_init_thread_start, and everything gets cleaned up.
However, in liboqs, you implicitly call the OpenSSL_cleanup routine via the registered atexit() handler, which again, is ok, but, it means that in test_sig.c (and likely your other tests), you join and exit the thread prior to OpenSSL getting cleaned up. It appears whats happening is that, on exit, the C library is cleaning up its thread local data list (I feel like I investigated this before), and, not having an cleanup handler registered (recalling from above that we register the local data handler with libcrypto, not with libc), and so it just NULL's the pointer in the thread local data table in the c library. As a result when OpenSSL_cleanup is called (either explicitly or implicitly), we run through the libcrypto registered cleanup handlers, the handler for rcu gets called (ossl_rcu_free_local_data), and when we call CRYPTO_THREAD_get_local to fetch the local data to free (which maps to pthread_getspecific) it returns NULL, as libc already expunged that pointer. As a result the local data leaks.
This API could use some reworking, as It seems to me that it relies on implementation details in libc that aren't guaranteed (I think in the past those pointers stuck around until after all the exit handlers ran, but I need to look further). The documentation here: https://docs.openssl.org/3.0/man3/OPENSSL_init_crypto/#description Is sort of vague on the subject, indicating that the cleanup should be done on thread exit, but thats not really true, as its actually done on behalf of the thread when the library context is cleaned.
For the time being, as any fix here would be an ABI break, I think its something we need to live with. The good news(?) is that theres an API to manage this, and theres precedent for it in our tests. OPENSSL_thread_stop is used to explicitly clean up a threads local data prior to exiting, which can be used here. You can see it used in situations similar/identical to yours in drbgtest.c and threadstest.h in which thread wrappers like thread_run() call it prior to exiting for just this reason.
You will likely need to do this for your various thread enabled tests, but this patch:
diff --git a/tests/test_sig.c b/tests/test_sig.c
index e94d3034..bd5e4728 100644
--- a/tests/test_sig.c
+++ b/tests/test_sig.c
@@ -9,6 +9,7 @@
#include <string.h>
#include <oqs/oqs.h>
+#include <openssl/crypto.h>
#if OQS_USE_PTHREADS
#include <pthread.h>
@@ -183,6 +184,7 @@ struct thread_data {
void *test_wrapper(void *arg) {
struct thread_data *td = arg;
td->rc = sig_test_correctness(td->alg_name);
+ OPENSSL_thread_stop();
return NULL;
}
#endif
which I tested here, resolves the leak for me
Hi @nhorman, thanks for tracking this down. An observation I made earlier when this issue was discovered was that it seems to manifest with SHA3 fetches but not SHA2. Does this still fit with your findings?
It's a bit late here now, but I'll take a look in the am and let you know
@ashman-p do you have a particular signature algorithm that you see as not encountering the leak? I'm looking at the available defined SIG algs and I don't see any available that make use of SHA3
resolves the leak for me
Thanks @nhorman for the analysis and the proposed fix. I guess the proper location for your fix would be here?
@baentsch yes, that seems like a reasonable approach I think.
@SWilson4 @ashman-p @dstebila Please also take note of this guidance from OpenSSL documentation:
Similarly this message will also not be sent if OpenSSL is linked statically, and therefore applications using static linking should also call OPENSSL_thread_stop() on each thread.
and in addition
On Linux/Unix where OpenSSL has been loaded via dlopen() and the application is multi-threaded and if dlclose() is subsequently called prior to the threads being destroyed then OpenSSL will not be able to deallocate resources associated with those threads. The application should either call OPENSSL_thread_stop() on each thread prior to the dlclose() call, or alternatively the original dlopen() call should use the RTLD_NODELETE flag (where available on the platform).
So I think it is actually mandatory for liboqs to call this function given it should work both statically linked as well as when used via dlopen() -- which I think is the primary use case (and surely when used as part of (a dynamically loaded) oqsprovider), no? Only question is how to ensure it's called on each thread?
The more I read the more I wonder how/why this ever worked OK given this guidance and by-default use of pthreads. The sig test also is not the only one actually using threading: At first glance, KEM and SIG_STFL testing do the same...
cmake -GNinja -DOPENSSL_ROOT_DIR=../.localopenssl_83efcfdfa1_debug -DOQS_MINIMAL_BUILD="SIG_ml_dsa_65" -DOQS_USE_SHA3_OPENSSL=ON -DCMAKE_BUILD_TYPE=Debug ..
@nhorman, I meant to say that the problem only manifested with "-DOQS_USE_SHA3_OPENSSL=ON". Not that there is a sig test actually uses sha3. Defining OQS_USE_OPENSSL=ON means the OQS library uses OpenSSL's SHA2. My config additionally sets OQS_USE_SHA3_OPENSSL=ON. Without this, Spencer could not recreate the problem. Thats why I asked the question about differences between these 2 options. It seems to me that the problem would be present regardless. fetch_ossl_objects(void) in common/ossl_helpers.c calls EVP_MD_fetch() for sha2 and sha3. But, I am not sure where sha3 is used.
@ashman-p yes, I would think so, I'm a bit tied up at the moment, but given what we've found I would imagine that the leak would occur in that configuration as well, and be solved by the same proposed fix
fetch_ossl_objects(void) in common/ossl_helpers.c calls EVP_MD_fetch() for sha2 and sha3. But, I am not sure where sha3 is used.
That is a very good observation, @ashman-p -- this code now seems dubious to me: Shouldn't the EVP-alg-fetchers be contingent/ifdef'd upon the 3 corresponding build options? What's your take on that @beldmit ?
@baentsch you are 100% correct, they should be properly wrapped by #ifdefs - if we don't use SHA3 or AES from OpenSSL, it makes no sense to fetch it.
I have originally written this prefetching code because of performance reasons instead of implicit fetching used before, but AFAIK there were many changes since then - e.g. an option to provide 3rd-party callbacks etc so this should be reconsidered