aws-sdk-cpp
aws-sdk-cpp copied to clipboard
Aws::Crt::Io::ClientBootstrap destructor may launch thread at process exit, and crash
Confirm by changing [ ] to [x] below to ensure that it's a bug:
- [x] I've gone though Developer Guide and API reference
- [x] I've searched for previous similar issues and didn't find any solution
Describe the bug We're getting a user report of a crash, seemingly at process shutdown, on Windows: https://github.com/conda-forge/arrow-cpp-feedstock/issues/567
Apparently the ClientBootstrap destructor can indirectly trigger the launch of a new thread using aws_thread_launch. The thread launch fails at process shutdown, at least on Windows, triggering an assertion error and therefore a process crash.
SDK version number 1.9.120
Platform/OS/Hardware/Device Windows/10.0.17763 (also reported on CentOS 8 and Ubuntu: https://issues.apache.org/jira/browse/ARROW-15141)
To Reproduce (observed behavior) Basically https://github.com/conda-forge/arrow-cpp-feedstock/issues/567#issue-1047929850, but I'm not sure what the exact steps are (I'm not the original reporter).
Expected behavior Failing to launch a thread at process shutdown should probably not crash the process.
@xhochy
Does not happen with SDK version 1.8.186.
In case it could help narrow down the source of the bug, I tested a few different versions of aws-sdk-cpp on CentOS 7:
- No issue: 1.1.186
- Bug: 1.9.120, 1.9.140
Hi @jdblischak , Can you share how you are reproducing this? In the post linked by the op there seems to be a fix provided by conda-forge so I'm wondering if this is on their side rather than the sdk?
The fix on the conda-forge side was to revert back to the 1.8.186 SDK version. With the current issue, we cannot use a newer SDK on Windows.
@KaibaLopez Thanks for following up
In the post linked by the op there seems to be a fix provided by conda-forge so I'm wondering if this is on their side rather than the sdk?
As @xhochy commented, the conda-forge workaround is to pin to an older version of aws-sdk-cpp. Personally I fixed it by specifying aws-sdk-cpp=1.8.186=h9ad65fb_2 for my conda env.
Can you share how you are reproducing this?
I was able to reproduce the bug using the code below:
mamba create -n test-aws python=3.9 pandas=1.2 pyarrow=2.0 aws-sdk-cpp=1.9.120
conda activate test-aws
python test-arrow.py
where test-arrow.py is the reproducible example script copied from https://github.com/conda-forge/arrow-cpp-feedstock/issues/567
import numpy as np
import pandas as pd
def test_error():
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df.to_parquet('test.parquet')
if __name__ == '__main__':
test_error()
Here is the full error and traceback that I observe:
% python test-arrow.py
Fatal error condition occurred in /home/conda/feedstock_root/build_artifacts/aws-c-io_1633633131324/work/source/event_loop.c:72: aws_thread_launch(&cleanup_thread, s_event_loop_destroy_async_thread_fn, el_group, &thread_options) == AWS_OP_SUCCESS
Exiting Application
################################################################################
Stack trace:
################################################################################
~/mambaforge/envs/test-aws/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_backtrace_print+0x59) [0x2aaac1581f19]
~/mambaforge/envs/test-aws/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_fatal_assert+0x48) [0x2aaac1573098]
~/mambaforge/envs/test-aws/lib/python3.9/site-packages/pyarrow/../../.././././libaws-c-io.so.1.0.0(+0x10a43) [0x2aaac17bca43]
~/mambaforge/envs/test-aws/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_ref_count_release+0x1d) [0x2aaac1583fad]
~/mambaforge/envs/test-aws/lib/python3.9/site-packages/pyarrow/../../.././././libaws-c-io.so.1.0.0(+0xe35a) [0x2aaac17ba35a]
~/mambaforge/envs/test-aws/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_ref_count_release+0x1d) [0x2aaac1583fad]
~/mambaforge/envs/test-aws/lib/python3.9/site-packages/pyarrow/../../../././libaws-crt-cpp.so(_ZN3Aws3Crt2Io15ClientBootstrapD1Ev+0x3a) [0x2aaac1526f5a]
~/mambaforge/envs/test-aws/lib/python3.9/site-packages/pyarrow/../../.././libaws-cpp-sdk-core.so(+0x5f570) [0x2aaac0faa570]
/lib64/libc.so.6(+0x39c99) [0x2aaaab835c99]
/lib64/libc.so.6(+0x39ce7) [0x2aaaab835ce7]
/lib64/libc.so.6(__libc_start_main+0xfc) [0x2aaaab81e50c]
python(+0x20aa51) [0x55555575ea51]
Aborted
Update: this error has been reported on Ubuntu and CentOS as well: https://issues.apache.org/jira/browse/ARROW-15141
@ihnorton have you run into something similar for tiledb?
No, we are still on 1.8, and that backtrace does not ring a bell.
Hi. Just to say we have hit this exact same issue using v1.9.72 within our in-house build at MathWorks.
We also see this on windows now (while updating).
- @KaibaLopez the code in question is here:
https://github.com/awslabs/aws-c-io/blob/b5cad3d21018e84a5084d6e191661fa604b49f0c/source/event_loop.c#L73-L75
aws_thread_launchuses the win32CreateThreadAPI:
https://github.com/awslabs/aws-c-common/blob/cba230815132f53206c501874e03a286765fb225/source/windows/thread.c#L258-L259
- it is documented here that
CreateThreadis not valid when the process is exiting:
The ExitProcess, ExitThread, CreateThread, CreateRemoteThread functions, and a process that is starting (as the result of a call by CreateProcess) are serialized between each other within a process. Only one of these events can happen in an address space at a time. This means that the following restrictions hold:
- you can see a backtrace here where this error message happens after ExitProcess has been called
Was this resolved by https://github.com/awslabs/aws-c-io/pull/515?
Was this resolved by awslabs/aws-c-io#515?
I don't know this repo, but from looking around, it seems that:
- the referenced PR is part of aws-c-io 0.13.5
- latest aws-sdk-cpp release seems to use aws-c-io 0.10.20
The last update of the aws-c-io version used in this repo was in June (NB: at which point 5 newer releases of aws-c-io would have been already available). Perhaps @sdavtaker (author of last update) can illuminate the process of what's necessary to update the respective dependencies.
This issue is biting us quite hard in conda-forge, made worse by the fact that aws-sdk-cpp 1.8 does not seem compatible anymore with current versions of the rest of the aws-c-* stack (which we need to unbundle for several reasons).
I also noticed still regular discussions about this problem in other repos, e.g. https://github.com/huggingface/datasets/issues/3310
Furthermore: This bug also happens outside pyarrow, I incorporate AWS in a standalone Windows C-program and that crashes during exit.
So it would be really good if we could upgrade aws-c-io here and then determine if that actually fixes things...
pyarrow 10.0.1 was just released in conda-forge, which is the first release where we're building against aws-sdk-cpp 1.9.* again after more than a year. Since we cannot test the failure reported here on our infra, I'd be very grateful if someone could verify that the problem does or doesn't reappear. 🙃
conda install -c conda-forge pyarrow=10
Edit: if things are fine, I'm happy to backport this to arrow 6.x-9.x.
Confirmed. Thanks @h-vetinari! See reproducible example at https://github.com/conda-forge/arrow-cpp-feedstock/issues/567#issuecomment-1344764356
In case someone else is still facing it...
I had the same issue, but it was caused because Aws::ShutdownAPI was not being called correctly.
https://docs.aws.amazon.com/sdk-for-cpp/v1/developer-guide/basic-use.html
As @cardinotGV said please make sure that you are calling InitAPI and ShutdownAPI correctly:
#include <aws/core/Aws.h>
int main(int argc, char** argv)
{
Aws::SDKOptions options;
Aws::InitAPI(options);
{
// make your SDK calls here.
}
Aws::ShutdownAPI(options);
return 0;
}
If you are still running into any crashes at process exit please let me know
⚠️COMMENT VISIBILITY WARNING⚠️
Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.