aws-sdk-cpp icon indicating copy to clipboard operation
aws-sdk-cpp copied to clipboard

dynamodb concurrent GetItem timeout on ubuntu22.04

Open howz97 opened this issue 1 year ago • 6 comments

Describe the bug

In an ubuntu22.04 environment, timeouts when concurrent GetItem requests are sent to dynamodb. Here is my code

#include <atomic>
#include <iostream>

#include <aws/core/Aws.h>
#include <aws/dynamodb/DynamoDBClient.h>
#include <aws/dynamodb/model/GetItemRequest.h>

using namespace Aws::DynamoDB::Model;

constexpr uint num_threads = 100;
std::atomic<uint> counter = 0;

GetItemRequest get_rand_item_req() {
  GetItemRequest req;
  req.SetTableName("test.rand");
  req.SetConsistentRead(true);
  req.AddKey("id", AttributeValue().SetN(std::to_string(rand() % 1000)));
  return std::move(req);
}

void worker() {
  Aws::Client::ClientConfiguration config;
  config.region = "ap-northeast-1";
  Aws::DynamoDB::DynamoDBClient client(config);
  for (uint i = 0; i < 5; ++i) {
    GetItemRequest req = get_rand_item_req();
    GetItemOutcome outcome = client.GetItem(req);
    if (!outcome.IsSuccess()) {
      std::cout << outcome.GetError() << std::endl;
      break;
    }
    counter++;
  }
}

int main() {
  Aws::SDKOptions options;
  Aws::InitAPI(options);

  auto start = std::chrono::system_clock::now();
  std::vector<std::thread> workers;
  for (uint i = 0; i < num_threads; ++i) {
    workers.emplace_back(worker);
  }
  for (auto &w : workers) {
    w.join();
  }
  uint elapsed = std::chrono::duration_cast<std::chrono::seconds>(
                     std::chrono::system_clock::now() - start)
                     .count();
  std::cout << "finished counter " << counter << std::endl;
  std::cout << "elapsed seconds " << elapsed << std::endl;

  Aws::ShutdownAPI(options);
  return 0;
}

Expected Behavior

I expect it to output this without any error

finished counter 500
elapsed seconds 0

Current Behavior

HTTP response code: -1
Resolved remote host IP address: 13.248.70.8
Request ID:
Exception name:
Error message: curlCode: 28, Timeout was reached
0 response headers:
HTTP response code: -1
Resolved remote host IP address: 13.248.70.8
Request ID:
Exception name:
Error message: curlCode: 28, Timeout was reached
0 response headers:
finished counter 320
elapsed seconds 268

Reproduction Steps

  1. configure your aws credential
  2. compile this code
  3. run

Possible Solution

No response

Additional Information/Context

It worked fine if i set num_threads=1 so that GetItem is executed in only one thread.

finished counter 5
elapsed seconds 0

Or if i change the operating system to ubuntu20.04

AWS CPP SDK version used

1.11.90

Compiler and Version used

gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0

Operating System and version

Ubuntu 22.04.2 LTS

howz97 avatar Jun 02 '23 04:06 howz97

Hey thanks for reaching out, this seems to be a timeout in the underlying curl connection that we have some levers for that I would suggest trying out specifically

/**
 * Socket connect timeout. Default 1000 ms. Unless you are very far away from your the data center you are talking to, 1000ms is more than sufficient.
 */
long connectTimeoutMs = 1000

since you are using the ap region, im not sure where you are running from or how far you are from the data center. try increasing that number higher to catch the long tail of connection times.

Then secondly just some usage recommendation. It looks like you create and destroy a new client in each worker thread. Which each creation/destruction a curl hand is created and destroyed. I would suggest using only one client.

let me know if you are still seeing the issues after the configuration changes.

sbiscigl avatar Jun 02 '23 13:06 sbiscigl

Greetings! It looks like this issue hasn’t been active in longer than a week. We encourage you to check if this is still an issue in the latest release. Because it has been longer than a week since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or add an upvote to prevent automatic closure, or if the issue is already closed, please feel free to open a new one.

github-actions[bot] avatar Jun 05 '23 00:06 github-actions[bot]

Thanks, I'm tring your solution

howz97 avatar Jun 05 '23 02:06 howz97

I am runing this test program on EC2 c5.xlarge instance (ap-northeast-1c) This is my new code using only one client. https://github.com/howz97/dynamo_test/tree/simple_share_client

I got this poor performance on ubuntu22.04

finished counter 500
elapsed seconds 156

while I got an expected performance on ubuntu20.04

finished counter 500
elapsed seconds 0

howz97 avatar Jun 05 '23 02:06 howz97

Hi @howz97,

Thank you for providing an update. May I also suggest you to disable enableEndpointDiscovery on your client configuration? Such as

  Aws::Client::ClientConfiguration config;
  config.region = "ap-northeast-1";
  config.enableEndpointDiscovery= false;
  client = std::make_unique<Aws::DynamoDB::DynamoDBClient>(config);

This feature performs additional service call to "discover actual endpoint to call". I'm sorry that it is enabled by default, it is a default legacy behavior, I hope we will change the default to false on a next API version update.

Also, just in case, in your test code, you also measure thread creation. Also, in my experience with the SDK, the very first call to DynamoDB will be slightly worse in performance (for at least DNS resolution).

Please let us know if it improves the performance you observe.

Best regards, Sergey

SergeyRyabinin avatar Jun 05 '23 04:06 SergeyRyabinin

I still got a poor performance after set enableEndpointDiscovery=false .

finished counter 500
elapsed seconds 148

Spend more than 2 minutes to request 500 items is too long, and this only happen on ubuntu22.04. The overhead of thread creation cannot be so high, because this program can complete execution in 1 seconds on ubuntu20.04 . Both of the two test machines are EC2 c5.xlarge.

howz97 avatar Jun 05 '23 05:06 howz97