aws-sdk-cpp
aws-sdk-cpp copied to clipboard
dynamodb concurrent GetItem timeout on ubuntu22.04
Describe the bug
In an ubuntu22.04 environment, timeouts when concurrent GetItem requests are sent to dynamodb. Here is my code
#include <atomic>
#include <iostream>
#include <aws/core/Aws.h>
#include <aws/dynamodb/DynamoDBClient.h>
#include <aws/dynamodb/model/GetItemRequest.h>
using namespace Aws::DynamoDB::Model;
constexpr uint num_threads = 100;
std::atomic<uint> counter = 0;
GetItemRequest get_rand_item_req() {
GetItemRequest req;
req.SetTableName("test.rand");
req.SetConsistentRead(true);
req.AddKey("id", AttributeValue().SetN(std::to_string(rand() % 1000)));
return std::move(req);
}
void worker() {
Aws::Client::ClientConfiguration config;
config.region = "ap-northeast-1";
Aws::DynamoDB::DynamoDBClient client(config);
for (uint i = 0; i < 5; ++i) {
GetItemRequest req = get_rand_item_req();
GetItemOutcome outcome = client.GetItem(req);
if (!outcome.IsSuccess()) {
std::cout << outcome.GetError() << std::endl;
break;
}
counter++;
}
}
int main() {
Aws::SDKOptions options;
Aws::InitAPI(options);
auto start = std::chrono::system_clock::now();
std::vector<std::thread> workers;
for (uint i = 0; i < num_threads; ++i) {
workers.emplace_back(worker);
}
for (auto &w : workers) {
w.join();
}
uint elapsed = std::chrono::duration_cast<std::chrono::seconds>(
std::chrono::system_clock::now() - start)
.count();
std::cout << "finished counter " << counter << std::endl;
std::cout << "elapsed seconds " << elapsed << std::endl;
Aws::ShutdownAPI(options);
return 0;
}
Expected Behavior
I expect it to output this without any error
finished counter 500
elapsed seconds 0
Current Behavior
HTTP response code: -1
Resolved remote host IP address: 13.248.70.8
Request ID:
Exception name:
Error message: curlCode: 28, Timeout was reached
0 response headers:
HTTP response code: -1
Resolved remote host IP address: 13.248.70.8
Request ID:
Exception name:
Error message: curlCode: 28, Timeout was reached
0 response headers:
finished counter 320
elapsed seconds 268
Reproduction Steps
- configure your aws credential
- compile this code
- run
Possible Solution
No response
Additional Information/Context
It worked fine if i set num_threads=1 so that GetItem is executed in only one thread.
finished counter 5
elapsed seconds 0
Or if i change the operating system to ubuntu20.04
AWS CPP SDK version used
1.11.90
Compiler and Version used
gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0
Operating System and version
Ubuntu 22.04.2 LTS
Hey thanks for reaching out, this seems to be a timeout in the underlying curl connection that we have some levers for that I would suggest trying out specifically
/**
* Socket connect timeout. Default 1000 ms. Unless you are very far away from your the data center you are talking to, 1000ms is more than sufficient.
*/
long connectTimeoutMs = 1000
since you are using the ap region, im not sure where you are running from or how far you are from the data center. try increasing that number higher to catch the long tail of connection times.
Then secondly just some usage recommendation. It looks like you create and destroy a new client in each worker thread. Which each creation/destruction a curl hand is created and destroyed. I would suggest using only one client.
let me know if you are still seeing the issues after the configuration changes.
Greetings! It looks like this issue hasn’t been active in longer than a week. We encourage you to check if this is still an issue in the latest release. Because it has been longer than a week since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or add an upvote to prevent automatic closure, or if the issue is already closed, please feel free to open a new one.
Thanks, I'm tring your solution
I am runing this test program on EC2 c5.xlarge instance (ap-northeast-1c) This is my new code using only one client. https://github.com/howz97/dynamo_test/tree/simple_share_client
I got this poor performance on ubuntu22.04
finished counter 500
elapsed seconds 156
while I got an expected performance on ubuntu20.04
finished counter 500
elapsed seconds 0
Hi @howz97,
Thank you for providing an update.
May I also suggest you to disable enableEndpointDiscovery
on your client configuration? Such as
Aws::Client::ClientConfiguration config;
config.region = "ap-northeast-1";
config.enableEndpointDiscovery= false;
client = std::make_unique<Aws::DynamoDB::DynamoDBClient>(config);
This feature performs additional service call to "discover actual endpoint to call". I'm sorry that it is enabled by default, it is a default legacy behavior, I hope we will change the default to false on a next API version update.
Also, just in case, in your test code, you also measure thread creation. Also, in my experience with the SDK, the very first call to DynamoDB will be slightly worse in performance (for at least DNS resolution).
Please let us know if it improves the performance you observe.
Best regards, Sergey
I still got a poor performance after set enableEndpointDiscovery=false .
finished counter 500
elapsed seconds 148
Spend more than 2 minutes to request 500 items is too long, and this only happen on ubuntu22.04. The overhead of thread creation cannot be so high, because this program can complete execution in 1 seconds on ubuntu20.04 . Both of the two test machines are EC2 c5.xlarge.