aws-sdk-cpp icon indicating copy to clipboard operation
aws-sdk-cpp copied to clipboard

Memory leak if s2n_cleanup is not called

Open magnushakansson opened this issue 3 years ago • 10 comments

Confirm by changing [ ] to [x] below to ensure that it's a bug:

Describe the bug When upgrading from 1.8.x version to 1.9.49 we see memory leaks reported by the ASAN tool. See ASAN log below.

SDK version number 1.9.49

Platform/OS/Hardware/Device Linux Debian

To Reproduce (observed behavior) We create Aws::S3::S3Client objects on several threads. When program is terminated, several leaks are reported.

When we add a call to s2n_cleanup, https://github.com/aws/s2n-tls/blob/main/docs/USAGE-GUIDE.md#s2n_cleanup, during shutdown for each thread created there are no leaks reported.

Expected behavior No memory leaks. Do we really need to call s2n_cleanup explicitly just because the implementation creates a UUID?

Logs/output

1
malloc
‎/usr/local/bin/engine
2
CRYPTO_zalloc
‎/usr/local/bin/engine
3
s2n_defend_if_forked
‎/home/vagrant/jws/qix-pc3-build-ws/build/lib/crt/aws-crt-cpp/crt/s2n/utils/s2n_random.c:144:9
4
s2n_get_private_random_data
‎/home/vagrant/jws/qix-pc3-build-ws/build/lib/crt/aws-crt-cpp/crt/s2n/utils/s2n_random.c:175:5
5
s2n_openssl_compat_rand
‎/home/vagrant/jws/qix-pc3-build-ws/build/lib/crt/aws-crt-cpp/crt/s2n/utils/s2n_random.c:295:29
6
…::SecureRandomBytes_OpenSSLImpl::GetBytes
‎/home/vagrant/jws/qix-pc3-build-ws/build/lib/aws-cpp-sdk-core/source/utils/crypto/openssl/CryptoImpl.cpp:142:31
7
…::UUID::RandomUUID
‎/home/vagrant/jws/qix-pc3-build-ws/build/lib/aws-cpp-sdk-core/source/utils/UUID.cpp:79:27
8
…::STSAssumeRoleWebIdentityCredentialsProvider::STSAssumeRoleWebIdentityCredentialsProvider
‎/home/vagrant/jws/qix-pc3-build-ws/build/lib/aws-cpp-sdk-core/source/auth/STSCredentialsProvider.cpp:91:25
9
…::__1::__compressed_pair_elem<…>
‎/usr/lib/llvm-10/bin/../include/c++/v1/memory:2196:46
10
…::__1::__compressed_pair<…>
‎/usr/lib/llvm-10/bin/../include/c++/v1/memory:2289:42
11
…::__1::__shared_ptr_emplace<…>
‎/usr/lib/llvm-10/bin/../include/c++/v1/memory:3562:12
12
…::__1::shared_ptr<…>
‎/usr/lib/llvm-10/bin/../include/c++/v1/memory:4195:9
13
…::__1::enable_if<…>
‎/usr/lib/llvm-10/bin/../include/c++/v1/memory:4415:12
14
…::__1::shared_ptr<…>
‎/home/vagrant/jws/qix-pc3-build-ws/build/lib/aws-cpp-sdk-core/include/aws/core/utils/memory/stl/AWSAllocator.h:51:16
15
…::DefaultAWSCredentialsProviderChain::DefaultAWSCredentialsProviderChain
‎/home/vagrant/jws/qix-pc3-build-ws/build/lib/aws-cpp-sdk-core/source/auth/AWSCredentialsProviderChain.cpp:41:17
16
…::__1::__compressed_pair_elem<…>
‎/usr/lib/llvm-10/bin/../include/c++/v1/memory:2196:46
17
…::__1::__compressed_pair<…>
‎/usr/lib/llvm-10/bin/../include/c++/v1/memory:2289:42
18
…::__1::__shared_ptr_emplace<…>
‎/usr/lib/llvm-10/bin/../include/c++/v1/memory:3562:12
19
…::__1::shared_ptr<…>
‎/usr/lib/llvm-10/bin/../include/c++/v1/memory:4195:9
20
…::__1::enable_if<…>
‎/usr/lib/llvm-10/bin/../include/c++/v1/memory:4415:12
21
…::__1::shared_ptr<…>
‎/home/vagrant/jws/engine-common-ws/build/HEAD-release/../../Packages/aws-sdk-cpp/include/aws/core/utils/memory/stl/AWSAllocator.h:51:16
22
GetCredentialProvider
‎/home/vagrant/jws/engine-common-ws/build/HEAD-release/../../src/qaws/src/S3Client.cpp:115:12
23
…::S3Client::S3Client
‎/home/vagrant/jws/engine-common-ws/build/HEAD-release/../../src/qaws/src/S3Client.cpp:131:9
24
…::__1::__compressed_pair_elem<…>
‎/usr/lib/llvm-10/bin/../include/c++/v1/memory:2196:46
25
…::__1::__compressed_pair<…>
‎/usr/lib/llvm-10/bin/../include/c++/v1/memory:2289:42
26
…::__1::__shared_ptr_emplace<…>
‎/usr/lib/llvm-10/bin/../include/c++/v1/memory:3562:12
27
…::__1::enable_if<…>
‎/usr/lib/llvm-10/bin/../include/c++/v1/memory:4400:26
28
…::S3ObjectStream::S3ObjectStream

The first frame is our own:

    Client = Aws::MakeShared<Aws::S3::S3Client>(
        "S3Client",
        Aws::MakeShared<Aws::Auth::DefaultAWSCredentialsProviderChain>("DefaultAWSCredentialsProviderChain"),
        Config,
        Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy::RequestDependent,
        false
        );

Additional context Memory leaks were not reported on several 1.8.x versions we have used.

magnushakansson avatar Jul 06 '21 17:07 magnushakansson

Could you try this workaround (not the fix) to suppress a specific leak in S2N by:

export LSAN_OPTIONS=suppressions=scripts/suppressions.txt

wps132230 avatar Jul 21 '21 22:07 wps132230

exporting the above (or should I have the file present with some content?) doesn't change a bit.

==261254== 6,336 bytes in 24 blocks are indirectly lost in loss record 6 of 6
==261254==    at 0x4842839: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==261254==    by 0x4A726DD: CRYPTO_zalloc (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1)
==261254==    by 0x4A63754: EVP_CipherInit_ex (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1)
==261254==    by 0x50B9251: s2n_drbg_instantiate (aws-sdk-cpp/crt/aws-crt-cpp/crt/s2n/crypto/s2n_drbg.c:162)
==261254==    by 0x50AE15D: s2n_defend_if_forked (aws-sdk-cpp/crt/aws-crt-cpp/crt/s2n/utils/s2n_random.c:143)
==261254==    by 0x50AE210: s2n_get_private_random_data (aws-sdk-cpp/crt/aws-crt-cpp/crt/s2n/utils/s2n_random.c:175)
==261254==    by 0x50AE544: s2n_openssl_compat_rand (aws-sdk-cpp/crt/aws-crt-cpp/crt/s2n/utils/s2n_random.c:295)
==261254==    by 0x4F9DF69: Aws::Utils::Crypto::SecureRandomBytes_OpenSSLImpl::GetBytes(unsigned char*, unsigned long) (aws-sdk-cpp/aws-cpp-sdk-core/source/utils/crypto/openssl/CryptoImpl.cpp:142)
==261254==    by 0x4F88337: Aws::Utils::UUID::RandomUUID() (aws-sdk-cpp/aws-cpp-sdk-core/source/utils/UUID.cpp:79)

and

==261254== 1,560 bytes in 20 blocks are still reachable in loss record 2 of 6
==261254==    at 0x4842839: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==261254==    by 0x50AD85C: s2n_mem_malloc_no_mlock_impl (aws-sdk-cpp/crt/aws-crt-cpp/crt/s2n/utils/s2n_mem.c:126)
==261254==    by 0x50ACC25: s2n_realloc (aws-sdk-cpp/crt/aws-crt-cpp/crt/s2n/utils/s2n_mem.c:195)
==261254==    by 0x50AC9B2: s2n_alloc (aws-sdk-cpp/crt/aws-crt-cpp/crt/s2n/utils/s2n_mem.c:157)
==261254==    by 0x50AD3D5: s2n_dup (aws-sdk-cpp/crt/aws-crt-cpp/crt/s2n/utils/s2n_mem.c:240)
==261254==    by 0x5104172: s2n_cipher_suites_init (aws-sdk-cpp/crt/aws-crt-cpp/crt/s2n/tls/s2n_cipher_suites.c:1029)
==261254==    by 0x50AB632: s2n_init (aws-sdk-cpp/crt/aws-crt-cpp/crt/s2n/utils/s2n_init.c:44)
==261254==    by 0x505C657: aws_tls_init_static_state (aws-sdk-cpp/crt/aws-crt-cpp/crt/aws-c-io/source/s2n/s2n_tls_channel_handler.c:132)
==261254==    by 0x5050CF2: aws_io_library_init (aws-sdk-cpp/crt/aws-crt-cpp/crt/aws-c-io/source/io.c:213)
==261254==    by 0x4FDAF58: aws_mqtt_library_init (aws-sdk-cpp/crt/aws-crt-cpp/crt/aws-c-mqtt/source/mqtt.c:186)
==261254==    by 0x4FBB422: Aws::Crt::s_initApi(aws_allocator*) (aws-sdk-cpp/crt/aws-crt-cpp/source/Api.cpp:39)
==261254==    by 0x4FBB3E1: Aws::Crt::ApiHandle::ApiHandle(aws_allocator*) (aws-sdk-cpp/crt/aws-crt-cpp/source/Api.cpp:52)

kwach avatar Oct 01 '21 15:10 kwach

I think this is the same issue:

https://github.com/aws/aws-sdk-cpp/issues/1757

theShmoo avatar Oct 19 '21 17:10 theShmoo

Could you try this workaround (not the fix) to suppress a specific leak in S2N by:

export LSAN_OPTIONS=suppressions=scripts/suppressions.txt

This worked. But could you propose a fix? This memory leak is quite a show stopper for us.

theShmoo avatar Oct 19 '21 18:10 theShmoo

Kind Reminder of this issue.

Aren't there any other people out there experience this issue?

theShmoo avatar Jan 10 '22 06:01 theShmoo

Kind monthly reminder of this issue :)

theShmoo avatar Feb 09 '22 20:02 theShmoo

We are facing the same issue. We make a lot of s3-crt GetObjectAsync calls, each one of which spins up a new thread (as we are using the DefaultExecutor) and adds to the memory usage.

@theShmoo how exactly did you get the export LSAN_OPTIONS hack working? It isn't helping me. [Update: Never mind. Just realized that this is just an option on the leak sanitizer to suppress warnings about this leak]

dhananjays avatar Mar 24 '22 20:03 dhananjays

Hoi! Nice to hear that we are not alone!

We use LSAN as part of ASAN and if you use a suppression list you need to configure ASAN to recover from failures:

-fsanitize-recover=address -fsanitize-address-use-after-scope

these are the compile flags we use in addition to -fsanitize=address

theShmoo avatar Mar 24 '22 20:03 theShmoo

Certainly you are not alone. Suppressing the warnings does not feel so right :| We end up using thread_local to call s2n_cleanup(); whenever a thread is terminated. Now valgrind does not complain any more. We are happy again.

File: S2nCleanup.h

#include <s2n.h>

class S2nCleanup {
public:
    ~S2nCleanup() {
        s2n_cleanup();
    }
};

Any where we use s3Client.

thread_local S2nCleanup s2nCleanup{};

...

Aws::S3::Model::GetObjectOutcome outcome = s3Client->GetObject(request);

...

databucketio avatar Mar 26 '22 21:03 databucketio

Nice! Thanks for your tipp with the thread_local variable. This also works for my use case.

Maybe it will help others as well:

I use s3 in combination with https://github.com/yhirose/cpp-httplib

to get your fix working I need to create a custom thread pool and add the thread_local s2n clean up:

proof of concept:

struct thread_cleanup
{
  ~thread_cleanup()
  {
    s2n_cleanup();
  }
};

class task_queue : public httplib::TaskQueue
{
 public:
  explicit task_queue(size_t n) : m_pool(n) {}

  void enqueue(std::function<void()> fn) override
  {
    m_pool.enqueue(
        [f = std::move(fn)]
        {
          f();
          thread_local thread_cleanup cleanup{};
        });
  }

  void shutdown() override
  {
    m_pool.shutdown();
  }

 private:
  httplib::ThreadPool m_pool;
};

// when configuring the server:

m_server->new_task_queue = []
{
  // NOLINTNEXTLINE
  return new task_queue(12);
};



theShmoo avatar Mar 29 '22 07:03 theShmoo

This PR fixed a memory leak. Please let me know if you are still seeing any more leaks with the current version of this sdk.

jmklix avatar Mar 06 '23 20:03 jmklix

Greetings! It looks like this issue hasn’t been active in longer than a week. We encourage you to check if this is still an issue in the latest release. Because it has been longer than a week since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or add an upvote to prevent automatic closure, or if the issue is already closed, please feel free to open a new one.

github-actions[bot] avatar Mar 09 '23 00:03 github-actions[bot]