crash in Aws::Utils::Threading::PooledThreadExecutor
Confirm by changing [ ] to [x] below to ensure that it's a bug:
- [x] I've gone though Developer Guide and API reference
- [x] I've searched for previous similar issues and didn't find any solution
Describe the bug
When the application destroy shared pointer to PooledThreadExecutor and the executor has still some running tasks it will lead to crash.
SDK version number 1.8.185
Platform/OS/Hardware/Device Linux / Debian Buster
To Reproduce (observed behavior) Consider following code
{
{
std::shared_ptr<Aws::Utils::Threading::PooledThreadExecutor> executor;
// initialize executor
std::shared_ptr<Aws::S3::S3Client> s3Client;
// initialize client
Aws::Transfer::TransferManagerConfiguration transfer_config(executor.get());
transfer_config.s3Client = s3Client;
// just to reproduce the problem which rarely happen
transfer_config.transferStatusUpdatedCallback =
[&](const Aws::Transfer::TransferManager*,
const std::shared_ptr<const Aws::Transfer::TransferHandle>& handle) {
if (handle->GetStatus() == Aws::Transfer::TransferStatus::COMPLETED)
std::this_thread::sleep_for(std::chrono::seconds(30));
};
auto transferManager = Aws::Transfer::TransferManager::Create(transfer_config);
auto handle = transferManager->UploadFile(/* args */);
handle->WaitUntilFinished();
}
std::this_thread::sleep_for(std::chrono::seconds(60));
}
which can lead to the appended crash.
The main problem is WaitUntilFinished() is notified from lambda in the executor. Particularly from TransferManager::Handle PutObjectResponse, so it stops blocking. All shared pointer are destroyed, but it doesn't mean that also Instances are destroyed. And here is a race condition in PooledThreadExecutor.
void ThreadTask::MainTaskRunner()
{
while (m_continue)
{
while (m_continue && m_executor.HasTasks())
{
auto fn = m_executor.PopTask();
if(fn)
{
(*fn)(); // the lambda keeps shared pointer of `TransferManager` and indirectly also `PooledThreadExecutor`
Aws::Delete(fn);
}
}
if(m_continue)
{
m_executor.m_sync.WaitOne();
}
}
}
- if a task with lambda has been destroyed in
Aws::Delete(fn);then refcount of shared pointerPooledThreadExecutoris 1 and it will be destroyed in main thread - this is correct behavior - if main thread leaves block before then refcount of shared pointer
PooledThreadExecutoris higher then 1 and it keeps living in a lamda. Later, deleting the lambda triggers also destroying the executor. Destructor of executor blocks calling thread, because it join all his threads. But her the calling thread is from task and tries to join him self. Yes, in coredump the thread number is the same. The exception rised fromstd::thread::joinisstd::system_errorpthread_join error : Resource deadlock avoided
ThreadTask::~ThreadTask()
{
StopProcessingWork();
m_thread.join();
}
Expected behavior
I don't see any fix except a public function in PooledThreadExecutor which will block until the executor is fully drained. E.g.
{
{
std::shared_ptr<Aws::Utils::Threading::PooledThreadExecutor> executor;
// initialize executor
...
handle->WaitUntilFinished();
executor->WaitUntilStopped();
}
std::this_thread::sleep_for(std::chrono::seconds(60));
}
or any other method to safely destroy the executor.
Additional context
backtrace from coredump
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007fab7b5d4535 in __GI_abort () at abort.c:79
#2 0x00007fab7b99c983 in __gnu_cxx::__verbose_terminate_handler () at ../../../../src/libstdc++-v3/libsupc++/vterminate.cc:95
#3 0x00007fab7b9a28c6 in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:47
#4 0x00007fab7b9a2901 in std::terminate () at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:57
#5 0x00007fab7b9a2b34 in __cxxabiv1::__cxa_throw (obj=obj@entry=0x7fab60988a80, tinfo=tinfo@entry=0x7fab7ba87ae0 <typeinfo for std::system_error>,
dest=dest@entry=0x7fab7b9cb8c0 <std::system_error::~system_error()>) at ../../../../src/libstdc++-v3/libsupc++/eh_throw.cc:95
#6 0x00007fab7b99ea55 in std::__throw_system_error (__i=35) at /build/gcc-8-Ev0Tjh/gcc-8-8.3.0/build/x86_64-linux-gnu/libstdc++-v3/include/ext/new_allocator.h:86
#7 0x00007fab7b9cbd6c in std::thread::join (this=this@entry=0x5568078ad650) at ../../../../../src/libstdc++-v3/src/c++11/thread.cc:113
#8 0x0000556805b9fff2 in Aws::Utils::Threading::ThreadTask::~ThreadTask (this=0x5568078ad640, __in_chrg=<optimized out>)
at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-core/source/utils/threading/ThreadTask.cpp:19
#9 0x0000556805ba0079 in Aws::Delete<Aws::Utils::Threading::ThreadTask> (pointerToT=0x5568078ad640)
at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-core/include/aws/core/utils/memory/AWSMemory.h:94
#10 Aws::Utils::Threading::PooledThreadExecutor::~PooledThreadExecutor (this=0x5568078ab6e0, __in_chrg=<optimized out>)
at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-core/source/utils/threading/Executor.cpp:96
#11 0x0000556805a2dcc7 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5568078ab6d0) at /usr/include/c++/8/bits/shared_ptr_base.h:148
#12 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5568078ab6d0) at /usr/include/c++/8/bits/shared_ptr_base.h:148
#13 0x0000556805adcf93 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x5568078aec60, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
#14 std::__shared_ptr<Aws::Utils::Threading::Executor, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x5568078aec58, __in_chrg=<optimized out>)
at /usr/include/c++/8/bits/shared_ptr_base.h:1167
#15 std::shared_ptr<Aws::Utils::Threading::Executor>::~shared_ptr (this=0x5568078aec58, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr.h:103
#16 Aws::S3::S3Client::~S3Client (this=0x5568078aeb00, __in_chrg=<optimized out>) at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-s3/source/S3Client.cpp:161
#17 0x0000556805a2dcc7 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5568078aeaf0) at /usr/include/c++/8/bits/shared_ptr_base.h:148
#18 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5568078aeaf0) at /usr/include/c++/8/bits/shared_ptr_base.h:148
#19 0x0000556805a702c5 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x5568078f9730, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
#20 std::__shared_ptr<Aws::S3::S3Client, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x5568078f9728, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
#21 std::shared_ptr<Aws::S3::S3Client>::~shared_ptr (this=0x5568078f9728, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr.h:103
#22 Aws::Transfer::TransferManagerConfiguration::~TransferManagerConfiguration (this=0x5568078f9728, __in_chrg=<optimized out>)
at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-transfer/include/aws/transfer/TransferManager.h:38
#23 Aws::Transfer::TransferManager::~TransferManager (this=0x5568078f96a0, __in_chrg=<optimized out>)
at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-transfer/source/transfer/TransferManager.cpp:68
#24 0x0000556805a2dcc7 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5568078f9690) at /usr/include/c++/8/bits/shared_ptr_base.h:148
#25 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5568078f9690) at /usr/include/c++/8/bits/shared_ptr_base.h:148
#26 0x0000556805a66f14 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fab6014c9c8, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
#27 std::__shared_ptr<Aws::Transfer::TransferManager, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fab6014c9c0, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
#28 std::shared_ptr<Aws::Transfer::TransferManager>::~shared_ptr (this=0x7fab6014c9c0, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr.h:103
#29 Aws::Transfer::TransferManager::<lambda(const Aws::Http::HttpRequest*, long long int)>::~<lambda> (this=0x7fab6014c9c0, __in_chrg=<optimized out>)
at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-transfer/source/transfer/TransferManager.cpp:457
#30 std::_Function_base::_Base_manager<Aws::Transfer::TransferManager::DoSinglePartUpload(const std::shared_ptr<std::basic_iostream<char> >&, const std::shared_ptr<Aws::Transfer::TransferHandle>&)::<lambda(const Aws::Http::HttpRequest*, long long int)> >::_M_destroy (__victim=...) at /usr/include/c++/8/bits/std_function.h:188
#31 std::_Function_base::_Base_manager<Aws::Transfer::TransferManager::DoSinglePartUpload(const std::shared_ptr<std::basic_iostream<char> >&, const std::shared_ptr<Aws::Transfer::TransferHandle>&)::<lambda(const Aws::Http::HttpRequest*, long long int)> >::_M_manager(std::_Any_data &, const std::_Any_data &, std::_Manager_operation) (__dest=..., __source=..., __op=<optimized out>)
at /usr/include/c++/8/bits/std_function.h:212
#32 0x0000556805ba5dd1 in std::_Function_base::~_Function_base (this=0x7fab6098a650, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/std_function.h:257
#33 std::function<void (Aws::Http::HttpRequest const*, long long)>::~function() (this=0x7fab6098a650, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/std_function.h:370
#34 Aws::AmazonWebServiceRequest::~AmazonWebServiceRequest (this=0x7fab6098a608, __in_chrg=<optimized out>)
at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-core/include/aws/core/AmazonWebServiceRequest.h:43
#35 Aws::AmazonStreamingWebServiceRequest::~AmazonStreamingWebServiceRequest (this=0x7fab6098a608, __in_chrg=<optimized out>)
at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-core/source/AmazonStreamingWebServiceRequest.cpp:11
#36 0x0000556805b09b11 in Aws::S3::S3Client::<lambda()>::~<lambda> (this=0x7fab6098a600, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/std_function.h:257
#37 std::_Bind<Aws::S3::S3Client::PutObjectAsync(const Aws::S3::Model::PutObjectRequest&, const PutObjectResponseReceivedHandler&, const std::shared_ptr<const Aws::Client::AsyncCallerContext>&) const::<lambda()>()>::~_Bind (this=0x7fab6098a600, __in_chrg=<optimized out>) at /usr/include/c++/8/functional:386
--Type <RET> for more, q to quit, c to continue without paging--
#38 std::_Function_base::_Base_manager<std::_Bind<Aws::S3::S3Client::PutObjectAsync(const Aws::S3::Model::PutObjectRequest&, const PutObjectResponseReceivedHandler&, const std::shared_ptr<const Aws::Client::AsyncCallerContext>&) const::<lambda()>()> >::_M_destroy (__victim=...) at /usr/include/c++/8/bits/std_function.h:188
#39 std::_Function_base::_Base_manager<std::_Bind<Aws::S3::S3Client::PutObjectAsync(const Aws::S3::Model::PutObjectRequest&, const PutObjectResponseReceivedHandler&, const std::shared_ptr<const Aws::Client::AsyncCallerContext>&) const::<lambda()>()> >::_M_manager(std::_Any_data &, const std::_Any_data &, std::_Manager_operation) (__dest=..., __source=..., __op=<optimized out>)
at /usr/include/c++/8/bits/std_function.h:212
#40 0x0000556805b9eb23 in std::_Function_base::~_Function_base (this=0x7fab608bbda0, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/std_function.h:257
#41 std::function<void ()>::~function() (this=0x7fab608bbda0, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/std_function.h:370
#42 Aws::Delete<std::function<void ()> >(std::function<void ()>*) (pointerToT=0x7fab608bbda0)
at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-core/include/aws/core/utils/memory/AWSMemory.h:101
#43 Aws::Utils::Threading::ThreadTask::MainTaskRunner (this=0x5568078ad640) at ./obj-x86_64-linux-gnu/_deps/sdk-src/aws-cpp-sdk-core/source/utils/threading/ThreadTask.cpp:32
#44 0x00007fab7b9cbb2f in std::execute_native_thread_routine (__p=0x5568078b0740) at ../../../../../src/libstdc++-v3/src/c++11/thread.cc:80
#45 0x00007fab7c0c9fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#46 0x00007fab7b6ab4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thanks for the detailed write up and PR. I'll see if we can get it reviewed and merged.
Hello, is there an update on this bug fix? I am seeing the same issue. Thanks!
Hi, A new blocking wait for completion method for the TransferManager was merged: https://github.com/aws/aws-sdk-cpp/pull/2291
Please call pTransferManager->WaitUntilAllFinished() once you've finished the file transfer, or before leaving the scope of executor, or before the ShutdownAPI() call.
It does not fix the original issue reported here though, we will also check the original/root issue of PooledThreadExecutor cyclic references.
Noted, thank you for the update
It does not fix the original issue reported here though, we will also check the original/root issue of PooledThreadExecutor cyclic references.
I think, it cannot be solved otherwise than by suggest MR.
The main issue is passing pool as shared pointer. Then it cannot be guaranteed lifetime of object in main thread. This example is simplified architecture of the PooledThreadExecutor.
int main() {
struct Pool {
std::thread thread;
~Pool()
{
if (thread.joinable()) thread.join();
std::cout << "~Pool() finished\n";
}
};
std::cout << "1. simple\n";
{
auto pool = std::make_shared<Pool>();
pool->thread = std::thread{[=]() {
std::this_thread::sleep_for(std::chrono::seconds{5});
std::cout << "Task finished\n";
}};
}
std::cout << "2. with copy\n";
{
auto pool = std::make_shared<Pool>();
pool->thread = std::thread{[=]() {
auto copy = pool;
std::this_thread::sleep_for(std::chrono::seconds{5});
std::cout << "Task finished\n";
}};
}
std::cout << std::flush;
}
with output
1. simple
Task finished
~Pool() finished
2. with copy
The second thread was still running when the process ended because i copied shared pointer into task.
It can be seen, that it cannot be safely used. You are not able to prevent similar using. The real owner must ensure call something like poll->join() before leaving scope where he expected the pool will be destroyed.
This should be fixed with this merged PR: https://github.com/aws/aws-sdk-cpp/pull/2787
This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.