aws-sdk-cpp CURL-based HTTP client ignores `requestTimeoutMs` for S3::PutObject

Confirm by changing [ ] to [x] below to ensure that it's a bug:

[x] I've gone though Developer Guide and API reference
[x] I've searched for previous similar issues and didn't find any solution

Describe the bug

I am running a large number of S3Client::PutObject requests (several thousand within a few seconds) and observe long tail latencies, i.e., a tiny fraction of them takes 10s or longer. In order to shave off these tail latencies, I want to abort slow requests early and retry them.

I thus configure the S3 client like this:

Aws::Client::ClientConfiguration cfg;
cfg.requestTimeoutMs = 1000;
cfg.lowSpeedLimit = 100 * 1000;

This works for GetObject requests; however, it has no effect on PutObject requests. I have thus written a custom HTTP client that uses CURL's progress function in order to enforce the corresponding timeout manually. This works exactly as intended; however, (1) I think this should be done in the AWS SDK C++ (or CURL) as this just implements behavior that the SDK claims already has and (2) it probably does not work with streaming requests (which seems to use CURL's progress function as well).

Here is the header file (http_client.hpp) of my HTTP client:

// Author: Ingo Müller <[email protected]>
// Copyright and related rights waived via CC0
#ifndef AWS_HTTP_CLIENT_HPP
#define AWS_HTTP_CLIENT_HPP

#include <memory>

#include <aws/core/client/ClientConfiguration.h>
#include <aws/core/http/HttpClient.h>
#include <aws/core/http/HttpClientFactory.h>
#include <aws/core/http/curl/CurlHttpClient.h>

class HttpClientFactory : public Aws::Http::HttpClientFactory {
    [[nodiscard]] auto CreateHttpClient(
            const Aws::Client::ClientConfiguration& clientConfiguration) const
            -> std::shared_ptr<Aws::Http::HttpClient> override;

    [[nodiscard]] auto CreateHttpRequest(
            const Aws::String& uri, Aws::Http::HttpMethod method,
            const Aws::IOStreamFactory& streamFactory) const
            -> std::shared_ptr<Aws::Http::HttpRequest> override;

    [[nodiscard]] auto CreateHttpRequest(
            const Aws::Http::URI& uri, Aws::Http::HttpMethod method,
            const Aws::IOStreamFactory& streamFactory) const
            -> std::shared_ptr<Aws::Http::HttpRequest> override;
};

#endif  // AWS_HTTP_CLIENT_HPP

Here is the source file (http_client.cpp):

// Author: Ingo Müller <[email protected]>
// Copyright and related rights waived via CC0
#include "http_client.hpp"

#include <chrono>
#include <iostream>
#include <memory>
#include <mutex>
#include <string>
#include <thread>
#include <vector>

#include <aws/core/client/ClientConfiguration.h>
#include <aws/core/http/HttpClient.h>
#include <aws/core/http/HttpClientFactory.h>
#include <aws/core/http/curl/CurlHttpClient.h>
#include <aws/core/http/standard/StandardHttpRequest.h>
#include <boost/format.hpp>
#include <curl/curl.h>

using boost::format;
using namespace std::chrono_literals;

// How often should the progress be traced
static constexpr auto kReportFrequency = 1ms;

// What is the maximim time with zero progress after which requests should be
// aborted
static constexpr auto kMaxIdleDuration = 400ms;

struct RequestInfo {
    std::chrono::steady_clock::time_point start;
    std::chrono::steady_clock::time_point last_report;
    curl_off_t last_reported_dlnow;
    curl_off_t last_reported_ulnow;
    std::chrono::steady_clock::time_point last_dlnow_time;
    std::chrono::steady_clock::time_point last_ulnow_time;
    curl_off_t last_dlnow;
    curl_off_t last_ulnow;
    size_t num_calls;
};

void Trace(const std::string &message) {
    static std::mutex trace_mutex;
    std::lock_guard<std::mutex> l(trace_mutex);
    std::cerr << std::chrono::duration_cast<std::chrono::microseconds>(
                         std::chrono::system_clock::now().time_since_epoch())
                         .count()
              << ": [" << std::this_thread::get_id() << "] " << message
              << std::endl;
}

// Called by CURL "frequently". `clientp` is user-provided; we use the request
// state provided by our HttpClient.
auto progress_callback(void* const clientp, const curl_off_t dltotal,
                       const curl_off_t dlnow, const curl_off_t ultotal,
                       const curl_off_t ulnow) -> int {
    auto const now = std::chrono::steady_clock::now();

    auto& request = *reinterpret_cast<RequestInfo*>(clientp);
    request.num_calls++;

    // Update if request has made progress
    if (dlnow > request.last_dlnow) {
        request.last_dlnow = dlnow;
        request.last_dlnow_time = now;
    }
    if (ulnow > request.last_ulnow) {
        request.last_ulnow = ulnow;
        request.last_ulnow_time = now;
    }

    // Compute when the last prgress was made and if we should abort
    auto const no_dlnow_since =
            std::chrono::duration_cast<std::chrono::milliseconds>(
                    now - request.last_dlnow_time);
    auto const no_ulnow_since =
            std::chrono::duration_cast<std::chrono::milliseconds>(
                    now - request.last_ulnow_time);
    auto const no_progress_since = std::min(no_dlnow_since, no_ulnow_since);
    auto const do_abort_request = no_progress_since > kMaxIdleDuration;

    // Report current progress
    if (request.num_calls == 1 or do_abort_request or
        now - request.last_report > kReportFrequency) {
        // Compute metrics
        auto const request_duration =
                std::chrono::duration_cast<std::chrono::milliseconds>(
                        now - request.start)
                        .count();
        auto const report_duration =
                std::chrono::duration_cast<std::chrono::milliseconds>(
                        now - request.last_report)
                        .count();
        auto const req_dspeed = 1.0 * dlnow / request_duration;
        auto const req_uspeed = 1.0 * ulnow / request_duration;
        auto const cur_dspeed =
                1.0 * static_cast<double>(dlnow - request.last_reported_dlnow) /
                report_duration;
        auto const cur_uspeed =
                1.0 * static_cast<double>(ulnow - request.last_reported_ulnow) /
                report_duration;

        // Print report
        Trace((format("CURL progress: "
                      "clientp=%1%, i=%2%, request_duration=%3%ms, "
                      "report_duration=%4%ms, dltotal=%5%, ultotal=%6%, "
                      "dlnow=%7%, ulnow=%8%, req_dspeed=%9%kb/s, "
                      "req_uspeed=%10%kb/s, cur_dspeed=%11%kb/s, "
                      "cur_uspeed=%12%kb/s, no_dlnow_since=%13%ms, "
                      "no_ulnow_since=%14%ms") %  //
               clientp %
               request.num_calls % request_duration % report_duration %
               dltotal % ultotal % dlnow % ulnow % req_dspeed % req_uspeed %
               cur_dspeed % cur_uspeed % no_dlnow_since.count() %
               no_ulnow_since.count())
                      .str());

        // Track progress since last report
        request.last_report = now;
        request.last_reported_dlnow = dlnow;
        request.last_reported_ulnow = ulnow;
    }

    // Signal to CURL that request should be aborted
    if (do_abort_request) {
        Trace((format("Request %1% has not made progress for too long! "
                      "Aborting...") %
               clientp)
                      .str());
        return 1;
    }

    return 0;
}

class HttpClient : public Aws::Http::CurlHttpClient {
    using Base = Aws::Http::CurlHttpClient;

public:
    explicit HttpClient(const Aws::Client::ClientConfiguration& clientConfig)
        : Base(clientConfig) {}

    // XXX: Overriding this is optional
    // NOLINTNEXTLINE(google-default-arguments)
    auto MakeRequest(
            const std::shared_ptr<Aws::Http::HttpRequest>& request,
            Aws::Utils::RateLimits::RateLimiterInterface* readLimiter = nullptr,
            Aws::Utils::RateLimits::RateLimiterInterface* writeLimiter =
                    nullptr) const
            -> std::shared_ptr<Aws::Http::HttpResponse> override {
        Trace("MakeRequest called.");
        return Base::MakeRequest(request, readLimiter, writeLimiter);
    }

protected:
    void OverrideOptionsOnConnectionHandle(
            CURL* const connectionHandle) const override {
        Trace((format("Activating progress tracking of CURL request %1%") %
               connectionHandle)
                      .str());

        // Create state for this request
        auto const now = std::chrono::steady_clock::now();
        {
            std::lock_guard l(requests_mutex_);
            requests_.emplace_back(std::make_unique<RequestInfo>(
                    RequestInfo{now, now, 0, 0, now, now, 0, 0, 0}));
        }

        // Set up CURL's progress function
        curl_easy_setopt(connectionHandle, CURLOPT_NOPROGRESS, 0L);
        curl_easy_setopt(connectionHandle, CURLOPT_XFERINFODATA,
                         requests_.back().get());
        curl_easy_setopt(connectionHandle, CURLOPT_XFERINFOFUNCTION,
                         progress_callback);
    }

private:
    // Keep the state of requests such that CURL's progress function can track
    // their progress between calls. Since the client owns them, they are all
    // destroyed when the client is destroyed, at which point all requests have
    // finished.
    mutable std::vector<std::unique_ptr<RequestInfo>> requests_;
    mutable std::mutex requests_mutex_;
};

auto HttpClientFactory::CreateHttpClient(
        const Aws::Client::ClientConfiguration& clientConfiguration) const
        -> std::shared_ptr<Aws::Http::HttpClient> {
    Trace("CreateHttpClient called.");
    return std::make_shared<HttpClient>(clientConfiguration);
}

// Copied from aws-cpp-sdk-core/source/http/HttpClientFactory.cpp
auto HttpClientFactory::CreateHttpRequest(
        const Aws::String& uri, const Aws::Http::HttpMethod method,
        const Aws::IOStreamFactory& streamFactory) const
        -> std::shared_ptr<Aws::Http::HttpRequest> {
    Trace("CreateHttpRequest called.");
    return CreateHttpRequest(Aws::Http::URI(uri), method, streamFactory);
}

// Copied from aws-cpp-sdk-core/source/http/HttpClientFactory.cpp
auto HttpClientFactory::CreateHttpRequest(
        const Aws::Http::URI& uri, const Aws::Http::HttpMethod method,
        const Aws::IOStreamFactory& streamFactory) const
        -> std::shared_ptr<Aws::Http::HttpRequest> {
    Trace("CreateHttpRequest called.");
    auto request = std::make_shared<Aws::Http::Standard::StandardHttpRequest>(
            uri, method);
    request->SetResponseStreamFactory(streamFactory);

    return request;
}

SDK version number

1.9.94

Platform/OS/Hardware/Device the Lambda Python 3.7 environment based on Amazon Linux.

Sep 07 '21 16:09 ingomueller-net

Hi @ingomueller-net , Sorry for the lack of response here but quick question, wouldn't the value httpRequestTimeoutMs be what you are looking for? I believe requestTimeoutMs is supposed to be used for windows and ignored by curl.

Dec 17 '21 00:12 KaibaLopez

@KaibaLopez, I am not sure what the exact behavior of using httpRequestTimeoutMs was and whether or not it applied to uploads. However, it sets CURLOPT_TIMEOUT_MS, which is a timeout for the whole request, i.e., it sets the same absolute timeout for requests of any size. This isn't ideal: too large values may be to loose for small requests while too small values may make it impossible for large requests to finish.

The configuration I posted in the original post is ideal because it essentially only aborts requests that make zero progress and are thus obviously stuck. It does work exactly as intended for downloads and there is nothing in the documentation that suggests it wouldn't do the same for uploads, but as a matter of fact, it doesn't.

Dec 17 '21 06:12 ingomueller-net

Greetings! Sorry to say but this is a very old issue that is probably not getting as much attention as it deservers. We encourage you to check if this is still an issue in the latest release and if you find that this is still a problem, please feel free to open a new one.

Mar 03 '23 01:03 github-actions[bot]

aws-sdk-cpp aws-sdk-cpp copied to clipboard

CURL-based HTTP client ignores `requestTimeoutMs` for S3::PutObject

aws-sdk-cpp
aws-sdk-cpp copied to clipboard