aws-sdk-cpp icon indicating copy to clipboard operation
aws-sdk-cpp copied to clipboard

Shutdown API Segfault and Valgrind Issues: Aws::Utils::Crypto::CleanupCrypto() + CleanupHttp()

Open afowles opened this issue 4 years ago • 4 comments

Describe the bug SDK Shutdown segfaults and has valgrind memory issues, apparently around CleanupHttp / CleanupCrypto. This issue only seems to happen if a static pointer to a class which inits and cleans up the SDK is used. Curious why this would be the case.

SDK version number 1.8.113

Platform/OS/Hardware/Device Fedora 33, (GCC) 10.2.1 20201125

To Reproduce (observed behavior)

#include <memory>

#include <aws/core/Aws.h>

class AwsLifetime {
 public:
  AwsLifetime() {
    Aws::InitAPI(options);
  }

  ~AwsLifetime() {
    Aws::ShutdownAPI(options);
  }

 private:
  Aws::SDKOptions options;
};

static std::unique_ptr<AwsLifetime> aws;

int main() {
  aws = std::make_unique<AwsLifetime>();
  return 0;
} 
$ g++ -std=c++17 -Wall -Werror -O2 -laws-cpp-sdk-core -o test test.cpp
$ valgrind --tool=memcheck --leak-check=full ./test

Expected behavior Runs, clean valgrind output. With SDK version 1.7.323, (GCC) 10.1.1 20200507 Fedora 32

==3600== Memcheck, a memory error detector
==3600== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==3600== Using Valgrind-3.16.0 and LibVEX; rerun with -h for copyright info
==3600== Command: ./test
==3600==
==3600==
==3600== HEAP SUMMARY:
==3600==     in use at exit: 0 bytes in 0 blocks
==3600==   total heap usage: 4,050 allocs, 4,050 frees, 237,867 bytes allocated
==3600==
==3600== All heap blocks were freed -- no leaks are possible
==3600==
==3600== For lists of detected and suppressed errors, rerun with: -s
==3600== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Logs/output On Fedora 33 with versions described above First invalid read:

==591656== Memcheck, a memory error detector
==591656== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==591656== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==591656== Command: ./test
==591656==
==591656== Invalid read of size 8
==591656==    at 0x48C8809: Aws::Http::CleanupHttp() (in /usr/lib64/libaws-cpp-sdk-core.so)
==591656==    by 0x490A27C: Aws::ShutdownAPI(Aws::SDKOptions const&) (in /usr/lib64/libaws-cpp-sdk-core.so)
==591656==    by 0x4014A7: std::unique_ptr<AwsLifetime, std::default_delete<AwsLifetime> >::~unique_ptr() (in /home/fedora/test)
==591656==    by 0x4CF6236: __run_exit_handlers (in /usr/lib64/libc-2.32.so)
==591656==    by 0x4CF63DF: exit (in /usr/lib64/libc-2.32.so)
==591656==    by 0x4CDE1E8: (below main) (in /usr/lib64/libc-2.32.so)
==591656==  Address 0x5a87020 is 16 bytes inside a block of size 24 free'd
==591656==    at 0x483A9F5: free (vg_replace_malloc.c:538)
==591656==    by 0x4CF6236: __run_exit_handlers (in /usr/lib64/libc-2.32.so)
==591656==    by 0x4CF63DF: exit (in /usr/lib64/libc-2.32.so)
==591656==    by 0x4CDE1E8: (below main) (in /usr/lib64/libc-2.32.so)
==591656==  Block was alloc'd at
==591656==    at 0x4839809: malloc (vg_replace_malloc.c:307)
==591656==    by 0x48C8730: Aws::Http::InitHttp() (in /usr/lib64/libaws-cpp-sdk-core.so)
==591656==    by 0x49304F3: Aws::InitAPI(Aws::SDKOptions const&) (in /usr/lib64/libaws-cpp-sdk-core.so)
==591656==    by 0x4011A7: main (in /home/fedora/test)
==591656==

Additional context With Logging on simply running the program

[~]$ ./test
[DEBUG] 2020-12-30 16:01:19.044 FileSystemUtils [139993724766464] Environment value for variable HOME is /home/fedora
[DEBUG] 2020-12-30 16:01:19.045 FileSystemUtils [139993724766464] Home directory is missing the final / appending one to normalize
[DEBUG] 2020-12-30 16:01:19.045 FileSystemUtils [139993724766464] Final Home Directory is /home/fedora/
[INFO] 2020-12-30 16:01:19.045 Aws::Config::AWSConfigFileProfileConfigLoader [139993724766464] Initializing config loader against fileName /home/fedora/.aws/credentials and using profilePrefix = 0
[DEBUG] 2020-12-30 16:01:19.045 FileSystemUtils [139993724766464] Environment value for variable HOME is /home/fedora
[DEBUG] 2020-12-30 16:01:19.045 FileSystemUtils [139993724766464] Home directory is missing the final / appending one to normalize
[DEBUG] 2020-12-30 16:01:19.045 FileSystemUtils [139993724766464] Final Home Directory is /home/fedora/
[INFO] 2020-12-30 16:01:19.045 Aws::Config::AWSConfigFileProfileConfigLoader [139993724766464] Initializing config loader against fileName /home/fedora/.aws/config and using profilePrefix = 1
[DEBUG] 2020-12-30 16:01:19.045 FileSystemUtils [139993724766464] Environment value for variable HOME is /home/fedora
[DEBUG] 2020-12-30 16:01:19.045 FileSystemUtils [139993724766464] Home directory is missing the final / appending one to normalize
[DEBUG] 2020-12-30 16:01:19.045 FileSystemUtils [139993724766464] Final Home Directory is /home/fedora/
[INFO] 2020-12-30 16:01:19.045 Aws::Config::AWSConfigFileProfileConfigLoader [139993724766464] Unable to open config file /home/fedora/.aws/credentials for reading.
[INFO] 2020-12-30 16:01:19.045 Aws::Config::AWSProfileConfigLoader [139993724766464] Failed to reload configuration.
[DEBUG] 2020-12-30 16:01:19.045 FileSystemUtils [139993724766464] Environment value for variable HOME is /home/fedora
[DEBUG] 2020-12-30 16:01:19.045 FileSystemUtils [139993724766464] Home directory is missing the final / appending one to normalize
[DEBUG] 2020-12-30 16:01:19.045 FileSystemUtils [139993724766464] Final Home Directory is /home/fedora/
[INFO] 2020-12-30 16:01:19.045 Aws::Config::AWSConfigFileProfileConfigLoader [139993724766464] Unable to open config file /home/fedora/.aws/config for reading.
[INFO] 2020-12-30 16:01:19.045 Aws::Config::AWSProfileConfigLoader [139993724766464] Failed to reload configuration.
[INFO] 2020-12-30 16:01:19.046 CurlHttpClient [139993724766464] Initializing Curl library with version: 7.71.1, ssl version: OpenSSL/1.1.1i-fips
[DEBUG] 2020-12-30 16:01:19.046 ClientConfiguration [139993724766464] ClientConfiguration will use SDK Auto Resolved profile: [default] if not specified by users.
[WARN] 2020-12-30 16:01:19.046 ClientConfiguration [139993724766464] Retry Strategy will use the default max attempts.
[INFO] 2020-12-30 16:01:19.046 EC2MetadataClient [139993724766464] Creating AWSHttpResourceClient with max connections 2 and scheme http
[INFO] 2020-12-30 16:01:19.046 CurlHandleContainer [139993724766464] Initializing CurlHandleContainer with size 2
[INFO] 2020-12-30 16:01:19.046 CurlHandleContainer [139993724766464] Cleaning up CurlHandleContainer.
Segmentation fault (core dumped)

Any insights would be appreciated

afowles avatar Dec 30 '20 16:12 afowles

hi @afowles , Thanks for pointing this out to us! As a quick update, I'm able to reproduce this and started investigating on the root cause for these valgrind errors.

KaibaLopez avatar Jan 04 '21 22:01 KaibaLopez

Any updates? We are hitting the same issue after aws-sdk update.

The issue seems to have appeared between 1.7.311 and 1.8.159 releases

xonatius avatar Oct 21 '21 22:10 xonatius

Hello, I am using 1.7.108 and am getting the same issue. Here is what I am seeing in valgrind.

==426== Invalid read of size 8                                                                                                       
==426==    at 0x694C10E: Aws::Http::CleanupHttp() (HttpClientFactory.cpp:180)                                                        
==426==    by 0x6911A98: Aws::ShutdownAPI(Aws::SDKOptions const&) (Aws.cpp:119)                                                      
==426==    by 0x67122E4: AwsSource::~AwsSource() (IArchiveHandler.hpp:97)                                                            
==426==    by 0x67171AC: void __gnu_cxx::new_allocator<AwsSource>::destroy<AwsSource>(AwsSource*) (new_allocator.h:124)              
==426==    by 0x6717024: void std::allocator_traits<std::allocator<AwsSource> >::destroy<AwsSource>(std::allocator<AwsSource>&, AwsSo
urce*) (alloc_traits.h:542)                                                                                                          

And later on:

==426== Invalid read of size 4
==426==    at 0x66F3702: __gnu_cxx::__exchange_and_add(int volatile*, int) (atomicity.h:49)
==426==    by 0x66F37AB: __gnu_cxx::__exchange_and_add_dispatch(int*, int) (atomicity.h:82)
==426==    by 0x66F4ABD: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:147)
==426==    by 0x66F43F5: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() (shared_ptr_base.h:659)
==426==    by 0x6911C97: std::__shared_ptr<Aws::Http::HttpClientFactory, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() (shared_ptr_bas
e.h:925)
==426==    by 0x694CE6F: std::__shared_ptr<Aws::Http::HttpClientFactory, (__gnu_cxx::_Lock_policy)2>::operator=(std::__shared_ptr<Aws
::Http::HttpClientFactory, (__gnu_cxx::_Lock_policy)2>&&) (shared_ptr_base.h:1000)
==426==    by 0x694CADD: std::shared_ptr<Aws::Http::HttpClientFactory>::operator=(std::shared_ptr<Aws::Http::HttpClientFactory>&&) (s

Still later:

==426== Invalid read of size 8
==426==    at 0x6975F65: Aws::Utils::Crypto::CleanupCrypto() (Factories.cpp:655)
==426==    by 0x6911A9D: Aws::ShutdownAPI(Aws::SDKOptions const&) (Aws.cpp:120)
==426==    by 0x67122E4: AwsSource::~AwsSource() (IArchiveHandler.hpp:97)
==426==    by 0x67171AC: void __gnu_cxx::new_allocator<AwsSource>::destroy<AwsSource>(AwsSource*) (new_allocator.h:124)
==426==    by 0x6717024: void std::allocator_traits<std::allocator<AwsSource> >::destroy<AwsSource>(std::allocator<AwsSource>&, AwsSo
urce*) (alloc_traits.h:542)
==426==    by 0x6716E21: std::_Sp_counted_ptr_inplace<AwsSource, std::allocator<AwsSource>, (__gnu_cxx::_Lock_policy)2>::_M_dispose()
 (shared_ptr_base.h:531)

bipinmathew avatar Dec 16 '21 21:12 bipinmathew

Hello! We're facing the same problem on CentOS 7. Any update?

yuliy avatar Jul 13 '22 08:07 yuliy

Same issue in 1.9.220 with CentOS7-- gcc 7.3.1 -- believe issue is do to static shared_ptr in HttpClientFactory potentially being cleaned up before other statiic cleanup that can call ShutdownAPI. Probably should avoid statically initialized shared_ptr to avoid these kinds of errors.

mmacdonald86 avatar Feb 01 '23 18:02 mmacdonald86

Hello all in this issue,

Please note that this use is discouraged and not recommended. Please refer to the documentation how to initialize/shutdown the SDK: https://docs.aws.amazon.com/sdk-for-cpp/v1/developer-guide/basic-use.html

The reason you observe memory issues by following this design idea

static std::unique_ptr<AwsLifetime> aws;

is that the order of static object destruction is not determined by the C++ standard, and the SDK and it's dependencies are also using static global objects. After your main returns, there will be static variables destruction time and some of SDK's global static variables may be destructed before ShutdownAPI is called, resulting in an undefined behavior.

We know that the usage of static global variables including std:: smart pointers and data classes is considered a bad design and we are considering different ways of improving on this aspect. We've already made some improvements to get rid of std:: static objects being destructed prematurely/automatically, such as https://github.com/aws/aws-sdk-cpp/pull/2268. I suggest to try to update to the latest version of the SDK available (such as 1.11.6). However, we are not there yet to declare SDK usage such as mentioned above as fully supported.

Best regards, Sergey

SergeyRyabinin avatar Feb 01 '23 19:02 SergeyRyabinin

@SergeyRyabinin - Can you add to the documentation for ShutdownAPI that calls to it from within the destructor of a static is specifically unsupported (maybe that's already there?)

mmacdonald86 avatar Feb 01 '23 19:02 mmacdonald86

We know that the usage of static global variables including std:: smart pointers and data classes is considered a bad design and we are considering different ways of improving on this aspect. We've already made some improvements to get rid of std:: static objects being destructed prematurely/automatically, such as https://github.com/aws/aws-sdk-cpp/pull/2268. I suggest to try to update to the latest version of the SDK available (such as 1.11.6). However, we are not there yet to declare SDK usage such as mentioned above as fully supported.

It is encouraging that you are moving in that direction and I hope you can get there. I work on Apache Arrow which uses S3. We have our own static state challenges :cold_sweat: and have been moving singletons from static global variables to static local variables that are initialized the first time they are accessed as this gives us a predictable creation (and thus destruction) order.

If this is fixed then we don't have to require users to call Finalize so it would be a nice feature.

westonpace avatar Apr 14 '23 17:04 westonpace

Same problem here trying to call the InitApi and ShutdownAPI methods within a Singleton class.

Any updates on this? Does someone know any workaround?

tsarquis88 avatar Jun 16 '23 13:06 tsarquis88

The basic usage of the sdk should look like this, with all of the sdk calls between InitApi and ShutdownAPI.

#include <aws/core/Aws.h>
int main(int argc, char** argv)
{
   Aws::SDKOptions options;
   Aws::InitAPI(options);
   {
      // make your SDK calls here.
   }
   Aws::ShutdownAPI(options);
   return 0;
}

Please let us know if you run into anymore problems with this sdk.

jmklix avatar Sep 12 '23 17:09 jmklix

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

github-actions[bot] avatar Sep 12 '23 17:09 github-actions[bot]