aws-sdk-cpp
aws-sdk-cpp copied to clipboard
InitAPI+ShutdownAPI+InitAPI crashes
Describe the bug
I have a test suite where each test calls Aws::InitAPI, then executes a test and finally calls Aws::ShutdownAPI. The second test then crashes in Aws::InitAPI. This problem started occurring when upgrading aws-sdk-cpp from version 1.8.186 to 1.9.352.
This looks similar to #2102, but that issue is closed and I don't quite understand what the reporter's solution mentioned in https://github.com/aws/aws-sdk-cpp/issues/2102#issuecomment-1254440532 would be.
Expected Behavior
A second Aws::InitAPI does not crash after Aws::InitAPI + Aws::ShutdownAPI.
Current Behavior
A second Aws::InitAPI crashes after Aws::InitAPI + Aws::ShutdownAPI.
Reproduction Steps
Compile and run the following program, linking against aws-sdk-cpp 1.9.352:
#include <aws/core/Aws.h>
int main()
{
Aws::SDKOptions options;
Aws::InitAPI(options);
Aws::ShutdownAPI(options);
Aws::InitAPI(options); // line 10
Aws::ShutdownAPI(options);
}
The program will crash at line 10.
GDB backtrace:
(gdb) bt
#0 0x00007ffff791166f in s2n_x509_trust_store_wipe (store=0x118) at crt/aws-crt-cpp/crt/s2n/tls/s2n_x509_validator.c:135
#1 0x00007ffff78e0988 in s2n_config_cleanup (config=0x0) at crt/aws-crt-cpp/crt/s2n/tls/s2n_config.c:125
#2 0x00007ffff78e1373 in s2n_config_free (config=0x0) at crt/aws-crt-cpp/crt/s2n/tls/s2n_config.c:360
#3 0x00007ffff78bfa84 in s_s2n_ctx_destroy (s2n_ctx=0x82ba20) at crt/aws-crt-cpp/crt/aws-c-io/source/s2n/s2n_tls_channel_handler.c:1292
#4 0x00007ffff78c0bd1 in s_tls_ctx_new (alloc=0x7ffff7dd1620 <Aws::get_aws_allocator()::wrapper>, options=0x7fffffffbb68, mode=S2N_CLIENT) at crt/aws-crt-cpp/crt/aws-c-io/source/s2n/s2n_tls_channel_handler.c:1613
#5 0x00007ffff78c0c3c in aws_tls_client_ctx_new (alloc=0x7ffff7dd1620 <Aws::get_aws_allocator()::wrapper>, options=0x7fffffffbb68) at crt/aws-crt-cpp/crt/aws-c-io/source/s2n/s2n_tls_channel_handler.c:1625
#6 0x00007ffff783b758 in Aws::Crt::Io::TlsContext::TlsContext (this=0x7fffffffbcd0, options=..., mode=Aws::Crt::Io::TlsMode::CLIENT, allocator=0x7ffff7dd1620 <Aws::get_aws_allocator()::wrapper>) at crt/aws-crt-cpp/source/io/TlsOptions.cpp:423
#7 0x00007ffff773ebe2 in Aws::InitAPI (options=...) at aws-cpp-sdk-core/source/Aws.cpp:79
#8 0x0000000000400a2a in main () at reproduce.cpp:10
Valgrind output:
==176222== Memcheck, a memory error detector
==176222== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==176222== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==176222== Command: ./reproduce
==176222==
==176222== Invalid read of size 8
==176222== at 0x52E166F: s2n_x509_trust_store_wipe (s2n_x509_validator.c:135)
==176222== by 0x52B0987: s2n_config_cleanup (s2n_config.c:125)
==176222== by 0x52B1372: s2n_config_free (s2n_config.c:360)
==176222== by 0x528FA83: s_s2n_ctx_destroy (s2n_tls_channel_handler.c:1292)
==176222== by 0x5290BD0: s_tls_ctx_new (s2n_tls_channel_handler.c:1613)
==176222== by 0x5290C3B: aws_tls_client_ctx_new (s2n_tls_channel_handler.c:1625)
==176222== by 0x520B757: Aws::Crt::Io::TlsContext::TlsContext(Aws::Crt::Io::TlsContextOptions&, Aws::Crt::Io::TlsMode, aws_allocator*) (TlsOptions.cpp:423)
==176222== by 0x510EBE1: Aws::InitAPI(Aws::SDKOptions const&) (Aws.cpp:79)
==176222== by 0x400A29: main (reproduce.cpp:10)
==176222== Address 0x118 is not stack'd, malloc'd or (recently) free'd
==176222==
==176222==
==176222== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==176222== Access not within mapped region at address 0x118
==176222== at 0x52E166F: s2n_x509_trust_store_wipe (s2n_x509_validator.c:135)
==176222== by 0x52B0987: s2n_config_cleanup (s2n_config.c:125)
==176222== by 0x52B1372: s2n_config_free (s2n_config.c:360)
==176222== by 0x528FA83: s_s2n_ctx_destroy (s2n_tls_channel_handler.c:1292)
==176222== by 0x5290BD0: s_tls_ctx_new (s2n_tls_channel_handler.c:1613)
==176222== by 0x5290C3B: aws_tls_client_ctx_new (s2n_tls_channel_handler.c:1625)
==176222== by 0x520B757: Aws::Crt::Io::TlsContext::TlsContext(Aws::Crt::Io::TlsContextOptions&, Aws::Crt::Io::TlsMode, aws_allocator*) (TlsOptions.cpp:423)
==176222== by 0x510EBE1: Aws::InitAPI(Aws::SDKOptions const&) (Aws.cpp:79)
==176222== by 0x400A29: main (reproduce.cpp:10)
Possible Solution
From the backtrace, it seems incorrect for s_s2n_ctx_destroy (frame #3) to pass a null config to s2n_config_free, given that s2n_config_free doesn't perform a null check. But how come the config is null and the init code path is calling cleanup functions?
As far as I can tell, this is what happens:
- The first
Aws::InitAPIcallsAws::InitializeCrt→Aws::New<Aws::Crt::ApiHandle>→Aws::Crt::ApiHandle::ApiHandle→Aws::Crt::s_initApi→aws_mqtt_library_init→aws_io_library_init→aws_tls_init_static_state→s2n_disable_atexit, which setsatexit_cleanup = false. aws_tls_init_static_statethen callss2n_init, which setsinitialized = true.- The first
Aws::ShutdownAPIcallsAws::CleanupCrt→Aws::Delete<Aws::Crt::ApiHandle>→Aws::Crt::ApiHandle::~ApiHandle→aws_s3_library_clean_up→aws_http_library_clean_up→aws_io_library_clean_up→aws_tls_clean_up_static_state→s2n_cleanup.initializedis still true andatexit_cleanupis still false. - The second
Aws::InitAPIcallsAws::InitializeCrt→Aws::Crt::ApiHandle::ApiHandle→Aws::Crt::s_initApi→aws_mqtt_library_init→aws_io_library_init→aws_tls_init_static_state→s2n_disable_atexit, which fails sinceinitializedis true. aws_tls_init_static_statesetss_s2n_initialized_externally = trueand therefore doesn't calls2n_init.Aws::InitAPI→Aws::Crt::Io::TlsContext::TlsContext→aws_tls_client_ctx_new→s_tls_ctx_new→s2n_config_newfails sinces2n_inithasn't been called.s_s2n_ctx_destroyis called due to the above failure.s2n_config_freeis not prepared for freeing a half-initializeds2n_ctx.
I haven't verified this, but from code inspection I think that the regression might have been introduced in https://github.com/awslabs/aws-c-io/commit/4e462a0687d00d18cfdece3cd7a427d8a9b2e13d since before that change the return value of s2n_disable_atexit did not affect whether aws_tls_init_static_state calls s2n_init.
To me it looks like the problem is that Aws::ShutdownAPI doesn't fully clean up the s2n state, so the fix I came up with is to let s2n_cleanup_atexit_impl reset initialized and atexit_cleanup to their original values:
--- a/crt/aws-crt-cpp/crt/s2n/utils/s2n_init.c
+++ b/crt/aws-crt-cpp/crt/s2n/utils/s2n_init.c
@@ -88,6 +88,9 @@ static bool s2n_cleanup_atexit_impl(void)
/* the configs need to be wiped before resetting the memory callbacks */
s2n_wipe_static_configs();
+ initialized = false;
+ atexit_cleanup = true;
+
return s2n_result_is_ok(s2n_rand_cleanup_thread()) &&
s2n_result_is_ok(s2n_rand_cleanup()) &&
s2n_result_is_ok(s2n_locking_cleanup()) &&
Additional Information/Context
No response
AWS CPP SDK version used
1.9.352
Compiler and Version used
gcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3)
Operating System and version
CentOS 7.9.2009
Got the very same problem, was just going to open an issue
Same backtraces, same conclusions on investigation: problem is initialized is true in s2n_init.c at the moment of re-initialization. Didn't try to fix it though
Trying to reproduce this, but I'm not seeing any crash on Mac. @EgorkaZ which OS and SDK version are you seeing this on?
OS: Ubuntu 20.04 SDK: 1.9.342
Can try it on Mac. AFAIR the code, it should reproduce there too. @jmklix
We have fixed some memory leaks and changed the build process since this was opened. Can you confirm if you are still seeing this crash?
We have fixed some memory leaks and changed the build process since this was opened. Can you confirm if you are still seeing this crash?
I see no crash with version 1.10.49.
⚠️COMMENT VISIBILITY WARNING⚠️
Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.