aws-sdk-cpp
aws-sdk-cpp copied to clipboard
curlCode: 6, Couldn't resolve host name
Describe the bug
After about 10 minutes of intensive list, get, put, delete request to a specific bucket we get curlCode: 6, Couldn't resolve host name.
The client is a EC2 instance. Besides the SDK retries, the client code retries a few times as well but the error persists. After some time, with no code changes, everything works again but still fails after 10 minutes of requests. The EC2 and the bucket are in the same us-east-1
region.
Expected Behavior
No errors
Current Behavior
We enabled AWS debug log. Here is a snippet:
[DEBUG] 2022-04-28 19:21:49.640 AWSAuthV4Signer [139760352270080] Canonical Request String: GET
/
list-type=2&prefix=...
amz-sdk-invocation-id:5B5B995D-12C7-4E88-A63E-C66E553A5F51
amz-sdk-request:attempt=6; max=11
content-type:application/xml
host:foo.s3.us-east-1.amazonaws.com
x-amz-api-version:2006-03-01
x-amz-content-sha256:UNSIGNED-PAYLOAD
x-amz-date:20220428T192149Z
amz-sdk-invocation-id;amz-sdk-request;content-type;host;x-amz-api-version;x-amz-content-sha256;x-amz-date
UNSIGNED-PAYLOAD
[DEBUG] 2022-04-28 19:21:49.640 AWSAuthV4Signer [139760352270080] Final String to sign: AWS4-HMAC-SHA256
20220428T192149Z
20220428/us-east-1/s3/aws4_request
b4d2f505824b36c129243ece09bb5d665aaf760c88dd90a6b4497f127e2481cc
[DEBUG] 2022-04-28 19:21:49.640 AWSAuthV4Signer [139760352270080] Final computed signing hash: f8f0dd9bf9aee42bc5f91f2a6917fadf2c06718adfc15a10bbb8cd2b8e0887d9
[DEBUG] 2022-04-28 19:21:49.640 AWSAuthV4Signer [139760352270080] Signing request with: AWS4-HMAC-SHA256 Credential=AKIAX3GSQRN3HSJPB5CG/20220428/us-east-1/s3/aws4_request, SignedHeaders=amz-sdk-invoca
tion-id;amz-sdk-request;content-type;host;x-amz-api-version;x-amz-content-sha256;x-amz-date, Signature=f8f0dd9bf9aee42bc5f91f2a6917fadf2c06718adfc15a10bbb8cd2b8e0887d9
[DEBUG] 2022-04-28 19:21:49.640 AWSClient [139760352270080] Request Successfully signed
[DEBUG] 2022-04-28 19:21:49.640 CurlHandleContainer [139760352270080] Attempting to acquire curl connection.
[DEBUG] 2022-04-28 19:21:49.640 CurlHandleContainer [139760352270080] No current connections available in pool. Attempting to create new connections.
[DEBUG] 2022-04-28 19:21:49.640 CurlHandleContainer [139760352270080] attempting to grow pool size by 2
[INFO] 2022-04-28 19:21:49.640 CurlHandleContainer [139760352270080] Pool grown by 2
[INFO] 2022-04-28 19:21:49.640 CurlHandleContainer [139760352270080] Connection has been released. Continuing.
[DEBUG] 2022-04-28 19:21:49.640 CurlHandleContainer [139760352270080] Returning connection handle 0x7f1bf887ad90
[DEBUG] 2022-04-28 19:21:49.640 CurlHttpClient [139760352270080] Obtained connection handle 0x7f1bf887ad90
[DEBUG] 2022-04-28 19:21:49.641 CURL [139760352270080] (Text) Could not resolve host: foo.s3.us-east-1.amazonaws.com
[DEBUG] 2022-04-28 19:21:49.641 CURL [139760352270080] (Text) Closing connection 0
[ERROR] 2022-04-28 19:21:49.641 CurlHttpClient [139760352270080] Curl returned error code 6 - Couldn't resolve host name
[DEBUG] 2022-04-28 19:21:49.641 CurlHandleContainer [139760352270080] Destroy curl handle: 0x7f1bf887ad90 and decrease pool size by 1.
[DEBUG] 2022-04-28 19:21:49.641 AWSClient [139760352270080] Request returned error. Attempting to generate appropriate error codes from response
[ERROR] 2022-04-28 19:21:49.641 AWSClient [139760352270080] HTTP response code: -1
Resolved remote host IP address:
Request ID:
Exception name:
Error message: curlCode: 6, Couldn't resolve host name
0 response headers:
Reproduction Steps
A suite of get, put, list, and delete requests.
Possible Solution
No response
Additional Information/Context
No response
AWS CPP SDK version used
1.8.3
Compiler and Version used
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)
Operating System and version
CentOS Linux release 7.7.1908 (Core)
Hey @vudh1, just wanted to check in and see whether there is any ETA on this. Thanks!
I changed my client code to create a new S3client
for every request. It looks like the error is no longer happening. So, I would conclude that something is going with the long running S3Clients and resetting them clears that up. Any guidance from the maintainers would be appreciated.
Hi, do you have sample of reproducible code that we can use to investigate?
Unfortunately, the code using the AWS SDK is embedded into a larger software and pulling it out for a repro would be prohibitive.
Just to point out that the application worked fine for about a year and we have not seen such errors before. We have not changed the AWS SDK version or the code before this started happening. We have seen the same behavior happening in about the same time-frame, in an unrelated application using a different language binding of the AWS SDK. So, based on this empirical evidence I would conclude that the issue is not the SDK nor our code, but a change that took place on the S3 servers about a week ago.
Now, our application is a long running process, making S3 requests from time to time in batches. The original implementation creates an S3Client instance for each batch of S3 requests. This implementation has the issue described in the top comment.
We tried two ways of changing our implementation and both made the error go away:
- We create an S3Client for each S3 request (as mentioned in my previous comment)
- We create only one S3Client instance for the life of the process
I wish more guidance would be provided by AWS on how the SDK should be used in such applications and how we should deal with this type of errors.
I am running some simple test with loop to see if I can reproduce the same behavior. In that mean time, can you check if this problem is still persisting with our current version of SDK (1.9.251)?
Got it. Just in case it helps, here is our module that interacts with AWS https://github.com/Paradigm4/bridge/blob/xremove/src/S3Driver.cpp The last two functions _retryLoop
and _deleteAndUpdate
are the ones that make the actual AWS requests.
Hi @rvernica I see that in the code you gave, few changes (like here) might have been made for workaround solution right? We are trying to see where the cause of behavior so can you send the reproducible code without the changes?
The change you linked above is made to recover in case we encounter a curlCode: 6
error.
The workaround change was actually done in this commit by moving the client = std::make_unique<Aws::S3::S3Client>();
instance from the S3Driver
class to the S3Init
class.
In more details, before the change, the S3Client
instance was inside our S3Driver
class. This meant that an S3Client
instance is created for each batch of S3 requests. The approach worked fine for a year, but recently resulted in the curlCode: 6
errors.
After the change, the S3Cleint
instance was moved inside our S3Init
class. This class is initialized once per process, so only one S3Client
instance is created. This made the curlCode: 6
error go away.
Could this issue be related to this https://github.com/aws/aws-sdk-cpp/issues/1614 ? It looks like it was reported but never fixed.
Hi guys, I had the same problem. In my case, the problem appeared after accessing the API more than 1011 times.
I initialized the API (InitAPI) in my original code on each s3 operation.
I moved InitAPI to the main (instead of having them on each s3 operation), which solves the problem for me.
e.j.
Aws::SDKOptions options;
options.loggingOptions.logLevel = Aws::Utils::Logging::LogLevel::Debug;
Aws::InitAPI(options);
{
.... your code ....
}
Aws::ShutdownAPI(options);
I am using (Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy::Never) fix this, you can have a try.@rvernica
auto s3_client = Aws::MakeShared<Aws::S3::S3Client>( "allocationTag", Aws::Auth::AWSCredentials(Aws::String(ACCESS_KEY_ID), Aws::String(SECRET_KEY)), config, Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy::Never, false );
@rvernica are you still running into this bug? If so does using Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy::Never
fix it for you?
Greetings! It looks like this issue hasn’t been active in longer than a week. We encourage you to check if this is still an issue in the latest release. Because it has been longer than a week since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or add an upvote to prevent automatic closure, or if the issue is already closed, please feel free to open a new one.