aws-sdk-js-v3 icon indicating copy to clipboard operation
aws-sdk-js-v3 copied to clipboard

S3Client Got Corrupted on a particular container due to TimeoutError causing S3 writes to fail

Open rishi2808-ds opened this issue 1 year ago • 3 comments

Checkboxes for prior research

Describe the bug

The issue arises because when the AWS credentials expire, the AWS SDK makes a call to fetch new credentials and cache them using the memoize method. If this fetch operation fails and results in a TimeoutError, the AWS SDK’s memoize method caches this error. Consequently, subsequent calls retrieve the TimeoutError from the cache instead of attempting to fetch new credentials from AWS.

To reproduce this issue locally, we removed the credentials from the ~/.aws/credentials file, forcing the SDK to fall back to fromInstanceMetadata method for obtaining credentials, mirroring the same behaviour as on remote environment.

We then explicitly threw an error within the AWS SDK and observed that while the first attempt to fetch credentials triggered an API call to the Instance Metadata Service, subsequent attempts retrieved the error from the cache instead of making fresh API calls to Instance Metadata Service.

Below is the screenshot of snapshot of the values of the hasResult and result variables in the memoize method verifying that the TimeoutError is indeed being cached.

image-20240731-194758

Additional logs added.

Screenshot 2024-08-01 at 12 28 34 PM

First time when its get called we can see the added logs.

Screenshot 2024-07-31 at 9 43 12 PM (1)

Subsequent calls do not show added logs in the AWS SDK, indicating that no new API calls are being made. Instead, we continue to see TimeoutError logs, which means the error is being retrieved from the cache.

Screenshot 2024-07-31 at 7 29 08 PM (1)

SDK version number

aws-sdk/[email protected]

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

v20.10.0

Reproduction Steps

To reproduce issue locally,

Explicitly throw a TimeoutError in the httpRequest function located in node_modules/@aws-sdk/credential-provider-imds/dist/cjs/remoteProvider/httpRequest.js. When this function is called for the first time, it triggers and throws a TimeoutError, which then gets cached in Memoize. On subsequent calls, the function is not invoked again; instead, the cached TimeoutError is returned.

Observed Behavior

We then explicitly threw an error within the AWS SDK and observed that while the first attempt to fetch credentials triggered an API call to the Instance Metadata Service, subsequent attempts retrieved the error from the cache instead of making fresh API calls to Instance Metadata Service.

Expected Behavior

Subsequent calls should call to Instance Metadata Service to fetch credentials when TimeoutError is been stored in cache.

Possible Solution

Subsequent calls should call to Instance Metadata Service to fetch credentials when TimeoutError is been stored in cache.

Additional Information/Context

No response

rishi2808-ds avatar Aug 01 '24 06:08 rishi2808-ds

@rishi2808-ds Thanks for posting this in detail. I am seeing a similar issue in my production environment as well. The error gets cached causing all subsequent requests to fail for the container.

giri-sh-irke avatar Aug 01 '24 07:08 giri-sh-irke

Hi @rishi2808-ds - thanks for reaching out and providing the detailed explanation.

To better understand and investigate this issue, it would be helpful if you could provide a minimal reproducible code snippet or example. Having a concise code sample that demonstrates the problem you're encountering will assist me in reproducing and analyzing the issue more effectively on my end.

While the information you've provided so far is valuable, having a minimal reproducible code example will allow me to isolate the problem and potentially uncover any nuances or edge cases related to the credential fetching and caching behavior you've described.

Please feel free to share a simplified version of your code, ensuring that it captures the essence of the issue without any unnecessary complexities. This will streamline the investigation process and enable us to collaborate more efficiently in identifying the root cause and potential solutions.

Best, John

aBurmeseDev avatar Aug 06 '24 02:08 aBurmeseDev

To reproduce this issue locally, remove the credentials from the ~/.aws/credentials file, forcing the SDK to fall back to fromInstanceMetadata method for obtaining credentials, mirroring the same behaviour as on remote environment.

Explicitly throw a TimeoutError in the httpRequest function located in node_modules/@aws-sdk/credential-provider-imds/dist/cjs/remoteProvider/httpRequest.js. When this function is called for the first time, it triggers and throws a TimeoutError, which then gets cached in Memoize. On subsequent calls, the function is not invoked again; instead, the cached TimeoutError is returned.

Below is the code changes we have done in httpRequest.js file.

Object.defineProperty(exports, "__esModule", { value: true });
exports.httpRequest = void 0;
const property_provider_1 = require("@aws-sdk/property-provider");
const buffer_1 = require("buffer");
const http_1 = require("http");
var flag1 = false;
/**
 * @internal
 */
function httpRequest(options) {
    return new Promise((resolve, reject) => {
        if (!flag1) {
            console.log("http----->0")
            flag1 = true
            reject(new Error("TimeoutError1"));
        }
        const req = http_1.request({ method: "GET", ...options });
        console.log("http----->1")
        req.on("error", (err) => {
            reject(Object.assign(new property_provider_1.ProviderError("Unable to connect to instance metadata service"), err));
        });
        req.on("timeout", () => {
            reject(new Error("TimeoutError"));
        });
        req.on("response", (res) => {
            const { statusCode = 400 } = res;
            if (statusCode < 200 || 300 <= statusCode) {
                reject(Object.assign(new property_provider_1.ProviderError("Error response received from instance metadata service"), { statusCode }));
            }
            const chunks = [];
            res.on("data", (chunk) => {
                chunks.push(chunk);
            });
            res.on("end", () => {
                resolve(buffer_1.Buffer.concat(chunks));
            });
        });
        req.end();
    });
}
exports.httpRequest = httpRequest;
`

rishi2808-ds avatar Aug 07 '24 11:08 rishi2808-ds

Apologies for the delay, I'm not able to reproduce this in recent version (v3.919.0). If issue persists with any recent version, please create a new issue with a minimal reproducible code.

aBurmeseDev avatar Oct 29 '25 21:10 aBurmeseDev

This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.

github-actions[bot] avatar Oct 29 '25 21:10 github-actions[bot]

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.

github-actions[bot] avatar Nov 13 '25 00:11 github-actions[bot]