aws-sdk-js-v3 icon indicating copy to clipboard operation
aws-sdk-js-v3 copied to clipboard

SignatureDoesNotMatch when migrating from AWS SDK v2 to v3

Open gastonlifschitz opened this issue 2 years ago • 26 comments

Checkboxes for prior research

Describe the bug

I'm trying to migrate my codebase from v2 to v3 (like the title suggests) and I'm getting the following error:

{"name":"SignatureDoesNotMatch","$fault":"client","$metadata":{"httpStatusCode":403,"requestId":"76a6508d-7495-4152-8915-3b082272f2dc","attempts":1,"totalRetryDelay":0},"Type":"Sender","Code":"SignatureDoesNotMatch","message":"Signature expired: 20230907T155358Z is now earlier than 20230907T155403Z (20230907T155903Z - 5 min.)"}

This is my code I'm trying to replicate:

const client = new CloudWatchClient({
  region: 'us-east-1',
  credentials: fromEnv(),
})

export class CloudWatchUtils {
  async logMySqlResponseTimes(millis: number, operation: string) {
    const params = this.getMillisParams(operation, 'MySql', millis)

    try {
      return await client.send(new PutMetricDataCommand(params))
    } catch (e) {
      logger.error(`Could not log mysql response time to cloudwatch: ${JSON.stringify(e)}`)
    }
  }
}

And my code used to be like this (with v2):

const cloudwatch = new CloudWatch()

export class CloudWatchUtils {
  async logMySqlResponseTimes(millis: number, operation: string) {
    const params = this.getMillisParams(operation, 'MySql', millis)

    try {
      await cloudwatch.putMetricData(params).promise()
    } catch (error) {
      logger.error(`Could not log mysql response time ${error}`)
    }
  }
}

There's only one thing to consider: The code you are seeing is executed but we are not awaiting for the response when we call logMySqlResponseTimes since we are not interested in getting a result back.

With that being said, the sdk v3 throws the SignatureDoesNotMatch exception and the one with the v2 does not.

Also, tried using requestHandler with a maxSockets number (like 60/70) but it still didn't work and I want my code to be agnostic of any infraestructure or traffic that may be going on.

Please note that this is happening on a very busy lambda with around 1.7k invocations per minute (around 8-25 concurrent executions)

SDK version number

@aws-sdk/[email protected]

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

v18.17.0

Reproduction Steps

This is where we comunicate with cloudwatch

import { CloudWatchClient, PutMetricDataCommand } from '@aws-sdk/client-cloudwatch'
import { fromEnv } from '@aws-sdk/credential-providers'

const client = new CloudWatchClient({
  region: 'us-east-1',
  credentials: fromEnv(),
})

export class CloudWatchUtils {
  async logMySqlResponseTimes(millis: number, operation: string) {
    const params = this.getMillisParams(operation, 'MySql', millis)

    try {
      return await client.send(new PutMetricDataCommand(params))
    } catch (e) {
      logger.error(`Could not log mysql response time to cloudwatch: ${JSON.stringify(e)}`)
    }
  }


private getMillisParams(operation: string, service: string, millis: number) {
    return {
      Namespace: this.getNameSpace(),
      MetricData: [
        {
          MetricName: operation,
          // This might seem counterintuitive, but this way we have all
          // metrics grouped by their respective service
          Dimensions: [{ Name: service, Value: 'Milliseconds' }],
          Timestamp: new Date(),
          Value: millis,
          Unit: 'Milliseconds',
        },
      ],
    }
  }
}

And here where we use it:

import { CloudWatchUtils } from '../utils/CloudWatchUtils'

const NS_PER_SEC = 1e9
const MS_PER_NS = 1e6

type Function = () => Promise<any>
type Chronometer = (millis: number, operation: string) => Promise<void>

export function getProbabilisticChronometer(p: number, chron: Chronometer): Chronometer {
  if (p < 0 || p > 1) throw new Exception("'p' must be in the [0,1] range")
  return async (millis: number, operation: string) => {
    if (Math.random() < p) await chron(millis, operation)
  }
}

export async function timeIt(func: Function, chron: Chronometer, operation: string) {
  const time = process.hrtime()
  const ans = await func()
  const millis = hrtimeToMillis(process.hrtime(time))
  // Not awaiting, since we are not interested in getting a result back
  chron(millis, operation)
  return ans
}


const mysql = require('serverless-mysql')()

mysql.config({
  host: process.env.MYSQL_HOST,
  database: process.env.MYSQL_DATABASE,
  user: process.env.MYSQL_USER,
  password: process.env.MYSQL_PASSWORD,
})

export const enum MySQLTables {
  Devices = 'devices',
}

const CHRON_PROBABILITY = parseFloat(0.2)

const cloudWatch = new CloudWatchUtils()

const chron = getProbabilisticChronometer(
  CHRON_PROBABILITY,
  cloudWatch.logMySqlResponseTimes.bind(cloudWatch)
)

export class MySQLStorage {
  async select(table: MySQLTables, columnValues: object) {
    const valuesCount = Object.keys(columnValues).length

    const equalConditions = new Array(valuesCount).fill('?? = ?').join(' AND ')

    const query = `SELECT * FROM ?? WHERE ${equalConditions}`

    const queryValues = [table].concat(...Object.entries(columnValues))

    return await timeIt(() => mysql.query(query, queryValues), chron, 'QUERY')
  }
}

Since this may be a little complicated, the only thing important is that we do not await for the cloudwatch method. And for some context, this is not done all the time, only 20% of the time.

Observed Behavior

It seems that it throws error:

{"name":"SignatureDoesNotMatch","$fault":"client","$metadata":{"httpStatusCode":403,"requestId":"76a6508d-7495-4152-8915-3b082272f2dc","attempts":1,"totalRetryDelay":0},"Type":"Sender","Code":"SignatureDoesNotMatch","message":"Signature expired: 20230907T155358Z is now earlier than 20230907T155403Z (20230907T155903Z - 5 min.)"}

Expected Behavior

It should not throw that error and log the metric to cloudwatch

Possible Solution

No response

Additional Information/Context

No response

gastonlifschitz avatar Sep 07 '23 18:09 gastonlifschitz

Hi @gastonlifschitz, sorry to hear about your issues. Could you please confirm with me if this error happens sporadically or happen every time you try to run your code?

Please let me know.

Thanks!

yenfryherrerafeliz avatar Sep 11 '23 13:09 yenfryherrerafeliz

Hey @yenfryherrerafeliz! Yes, this error happens sporadically (after 5 min of deployment) but after those 5 minutes it becomes regularly and almost all the time

gastonlifschitz avatar Sep 11 '23 15:09 gastonlifschitz

Hey @yenfryherrerafeliz! Any update?

gastonlifschitz avatar Sep 20 '23 16:09 gastonlifschitz

Hi @gastonlifschitz, this issue can possibly happen if the time of the machine, where the SDK is being used, has a wrong time or timezone, which causes the signature to be calculated with an earlier time. Could you please confirm that that is not the case here?. You can also check the following post here, that provides more information regarding this.

Please let me know if that helps!

Thanks!

yenfryherrerafeliz avatar Sep 26 '23 19:09 yenfryherrerafeliz

Hey there! I Can confirm that this is not the case because I’m running on lambda and it happened when I magreares from v2 to v3 with the same use case and same lambda location.

gastonlifschitz avatar Sep 26 '23 21:09 gastonlifschitz

Hi there, seeing a similar thing since migrating from v2 to v3 where we are quite regularly getting errors like this:

Signature expired: 20231020T104900Z is now earlier than 20231020T105332Z (20231020T105832Z - 5 min.)

I have also tried upping the max connections to 100 but still seeing this issue that never occured on v2 on exactly the same system with exactly the same load.

It is possible we are calling out too much to cloudwatch from our lambdas but it is just odd that we never saw this error on v2. Was something changed around how the sdk request queue is handled?

emmamoinat avatar Oct 20 '23 11:10 emmamoinat

We are also experiencing this issue, although in our case it's occurring sporadically when calling out to SSM from a lambda.

We did not see this behaviour previously on the V2 SDK, only on the V3 SDK.

xanido avatar Oct 24 '23 00:10 xanido

Hi @gastonlifschitz, @emmapatterson, and @xanido. The issue that you folks are getting could be caused for different factors, but all of them have something in common and is that the signature in the request has a non expected time, which in this case is earlier than the time in the server processing the request. I have not found a specific reproduction scenario that I can investigate, but, for what I have read/found so far, this issue could be caused if using provisioned concurrency in lambda, which probably pre-initialize the requests to be sent, and then, when lambda is up and the requests are sent, the time elapsed in between is too much which causes the signature error.

To better keep investigating this, could you please confirm if you are using provisioned concurrency?

I look forward to your response.

Thanks!

yenfryherrerafeliz avatar Oct 31 '23 02:10 yenfryherrerafeliz

Hi @gastonlifschitz, @emmapatterson, and @xanido. The issue that you folks are getting could be caused for different factors, but all of them have something in common and is that the signature in the request has a non expected time, which in this case is earlier than the time in the server processing the request. I have not found a specific reproduction scenario that I can investigate, but, for what I have read/found so far, this issue could be caused if using provisioned concurrency in lambda, which probably pre-initialize the requests to be sent, and then, when lambda is up and the requests are sent, the time elapsed in between is too much which causes the signature error.

To better keep investigating this, could you please confirm if you are using provisioned concurrency?

I look forward to your response.

Thanks!

I can confirm in our case we are not using provisioned concurrency. In our scenario we have many lambdas all calling out to cloudwatch to audit certain calls.

We attempted some of the suggestions here: https://repost.aws/knowledge-center/lambda-sdk-signature

but we didn't see an improvement sadly. Thanks, Emma

emmamoinat avatar Oct 31 '23 14:10 emmamoinat

@yenfryherrerafeliz any update on this? It is getting very frustrating as we have constant errors coming in and we are also missing out on some required functionality because of this problem. If there is no upcoming solution we might have to downgrade to v2 again until this is solved. Thanks in advance!

emmamoinat avatar Nov 02 '23 12:11 emmamoinat

Hi @yenfryherrerafeliz, thank you for your reply!

We did indeed see this issue on Lambdas with provisioned concurrency. In fact, it was so frequent on Lambdas with provisioned concurrency that we were forced to disable provisioned concurrency to restore some stability to the application.

However, will caveat this by saying that we still see the issue on Lambdas without provisioned concurrency. It is less frequent, but still an issue.

xanido avatar Nov 06 '23 00:11 xanido

Hi everyone on the thread,

We are working alongside the lambda team on this issue with priority. I think the SignatureDoesNotMatch error might be occurring due to a delay between the Lambda function's initialization phase and its invocation. This delay could cause signatures generated during initialization to expire before the actual API call is made. This issue is particularly pertinent in scenarios involving provisioned concurrency, where Lambda may pre-initialize execution environments.

To better understand and diagnose the issue, it would be helpful to have:

  • A minimal code reproduction that includes the Lambda handler. Specifically, it's crucial to know whether the SDK calls (resulting in the signature generation) are made inside or outside the Lambda handler. If these calls are made outside the handler (during initialization), that might explain the signature expiration issue due to the time gap between initialization and invocation.

  • Which SDK version you are using and how? For example, some regions have the newer v3.362.0 embedded with the lambda runtime, some regions are still with v3.188.0. Are you relying on the embedded version, or providing the SDK as a layer, or bundling it and uploading to lambda?

Thanks, Ran~

RanVaknin avatar Nov 10 '23 18:11 RanVaknin

Hi everyone on the thread,

We are working alongside the lambda team on this issue with priority. I think the SignatureDoesNotMatch error might be occurring due to a delay between the Lambda function's initialization phase and its invocation. This delay could cause signatures generated during initialization to expire before the actual API call is made. This issue is particularly pertinent in scenarios involving provisioned concurrency, where Lambda may pre-initialize execution environments.

To better understand and diagnose the issue, it would be helpful to have:

  • A minimal code reproduction that includes the Lambda handler. Specifically, it's crucial to know whether the SDK calls (resulting in the signature generation) are made inside or outside the Lambda handler. If these calls are made outside the handler (during initialization), that might explain the signature expiration issue due to the time gap between initialization and invocation.
  • Which SDK version you are using and how? For example, some regions have the newer v3.362.0 embedded with the lambda runtime, some regions are still with v3.188.0. Are you relying on the embedded version, or providing the SDK as a layer, or bundling it and uploading to lambda?

Thanks, Ran~

Hi Ran,

In our case we aren't using provisioned concurrency for any of the lambdas that are failing with this error.

A minimal code reproduction:

Our CloudWatchLogsService:

import {
  CloudWatchLogs,
  CreateLogStreamCommand,
  CreateLogStreamCommandInput,
  CreateLogStreamCommandOutput,
  PutLogEventsCommand,
  PutLogEventsCommandInput,
  PutLogEventsCommandOutput
} from "@aws-sdk/client-cloudwatch-logs";
import {captureAWSv3Client} from "aws-xray-sdk-core";

export const cloudWatchLogs = captureAWSv3Client(new CloudWatchLogs({region: "eu-west-2"}));

export function createLogStream(createLogStreamRequest: CreateLogStreamCommandInput): Promise<CreateLogStreamCommandOutput> {
  return cloudWatchLogs.send(new CreateLogStreamCommand(createLogStreamRequest));
}

export function putLogEvents(putLogEventsRequest: PutLogEventsCommandInput): Promise<PutLogEventsCommandOutput> {
  return cloudWatchLogs.send(new PutLogEventsCommand(putLogEventsRequest));
}

Our Audit logger service does the following (adjusted a bit for more clarity):

async function auditLog(event: string): Promise<void> {
  const timeAuditLoggedAt = now();
  const logStreamName = UUID();

  try {
    await createLogStream({
      logGroupName: "our-audit-log-group",
      logStreamName
    });
  } catch (error) {
    if (error.message.includes(LOG_STREAM_ALREADY_EXISTS_ERROR_MESSAGE)) {
      logger.warn(`Audit log stream [${logStreamName}] already exists`);
    } else {
      logger.error(`error creating audit log stream [${logStreamName}]: [${error.toString()}]`);
      return;
    }
  }

  const putLogEventsRequest = {
    logEvents: [{
      timestamp: timeAuditLoggedAt.getTime(),
      message: event
    }],
    logGroupName: "our-audit-log-group",
    logStreamName
  };

  try {
     await putLogEvents(request);
  } catch (error) {
    logger.error(`error putting audit log events [${JSON.stringify({
      timestamp: timeAuditLoggedAt.getTime(),
      message: event
    })}] into log stream [${logStreamName}]`, error);
  }
}

Here is an example of how a lambda handler uses the above code:

export async function lambdaHandler(event: TriggerSourceEvent): Promise<TriggerSourceEvent> {
 try {
   //do some things for the lambda
   auditLog("All good in this lambda");
   return event;
  } catch (error) {
    auditLog("Something went wrong in this lambda");
    throw error;
}

AWS-SDK version: 3.429.0

We use esbuild to bundle up our lambdas but since our lambdas are Node18 we don't include the aws-sdk as an external dep at build time.

emmamoinat avatar Nov 13 '23 17:11 emmamoinat

Hi everyone on the thread,

We are working alongside the lambda team on this issue with priority. I think the SignatureDoesNotMatch error might be occurring due to a delay between the Lambda function's initialization phase and its invocation. This delay could cause signatures generated during initialization to expire before the actual API call is made. This issue is particularly pertinent in scenarios involving provisioned concurrency, where Lambda may pre-initialize execution environments.

To better understand and diagnose the issue, it would be helpful to have:

  • A minimal code reproduction that includes the Lambda handler. Specifically, it's crucial to know whether the SDK calls (resulting in the signature generation) are made inside or outside the Lambda handler. If these calls are made outside the handler (during initialization), that might explain the signature expiration issue due to the time gap between initialization and invocation.
  • Which SDK version you are using and how? For example, some regions have the newer v3.362.0 embedded with the lambda runtime, some regions are still with v3.188.0. Are you relying on the embedded version, or providing the SDK as a layer, or bundling it and uploading to lambda?

Thanks, Ran~

Hello everyone, As @xanido pointed out, we initially identified this issue with provisioned concurrency, and we had to disable it to restore stability to the application. Although the problem persists, it occurs less frequently now.

We utilize the middy-ssm middleware for fetching SSM Parameter Store parameters. To make our example more relevant to our case, I've created the following minimal code reproduction.

import middy from "@middy/core";
import ssm from '@middy/ssm'

const impl = async (event: any, context: any) => {

    try {
        const response = {
            statusCode: 200,
            body: JSON.stringify({ message: 'Hello ' + context.configuration }),
        };
        return response;
    } catch (error) {
        console.error('Error:', error);
        return {
            statusCode: 500,
            body: JSON.stringify({ error: 'Internal Server Error' }),
        };
    }
};

export const handler = middy()
    .use(ssm({
        fetchData: {
            configuration: '/sample-configuration'
        },
        setToContext: true
    }))
    .handler(impl);

We use AWS-SDK version 3.262.0. Our lambdas are Node18 so we don't include the aws-sdk.

Our region : ap-southeast-2

Other dependencies

@middy/core: 4.2.1,
@middy/ssm: 4.2.1,
aws-cdk-lib: 2.63.0,

Working example: https://github.com/kanishka3000/recreate

kanishka3000 avatar Nov 20 '23 22:11 kanishka3000

Hey @RanVaknin - Are there any updates regarding this issue? Is this issue prioritised? Thank you kindly

bri-liu avatar Nov 23 '23 00:11 bri-liu

@RanVaknin any news? Thanks

emmamoinat avatar Nov 27 '23 20:11 emmamoinat

Hi there,

The issue is being investigated internally both by the Lambda team and the SDK team. I promise to keep you all up to date.

Thanks, Ran~

RanVaknin avatar Nov 29 '23 00:11 RanVaknin

I'll provide a minimal repro for InvalidSignatureException in next comment.


@gastonlifschitz Looks like in the v3 equivalent code for logMySqlResponseTimes you call return await although the returned value is not consumed anywhere.

    try {
      return await client.send(new PutMetricDataCommand(params))
    }

It shouldn't make any difference, but can you change it to await and see if this issue is reproducible? The minimal repro provided by other users do not use return though.

I don't see handler in your sample code though.


@emmapatterson Thanks for providing a code sample.

I don't see any issue with the way you're calling SDKs in the Lambda. It's a simple API call, and there shouldn't be any time between signature computation and request being sent.


@kanishka3000 Thank you for providing a code sample.

It uses middy which calls the SDK on your behalf. It would be helpful if the sample code calls the SDK directly. Do share if you have equivalent code for that, if you'd attempted in the past.


@brian-homepass

We have several code samples, but we're still not able to repro it reliably. If you can provide a sample code, it would be awesome.

trivikr avatar Dec 01 '23 23:12 trivikr

The SDK only signs requests in the process of executing an API operation. By default we do not allow any significant delay between signing a request and sending it over the network.

Details:

  • The AWS SDK for JavaScript v3 uses Middleware Stack - object that hosts all the middleware in specific order, and executes it.
  • The signing of the request happens in awsAuthMiddleware which is added after retryMiddleware in finalizeRequest step. At this point the request is already executing.
  • The request is sent in node-http-handler

For InvalidSignatureException to happen in Lambda, the request should have executed awsAuthMiddleware, but waited for five minutes before it's sent to the server. We've to investigate why this delay is being added.

This delay could be added by middleware added between awsAuthMiddleware and handler. The Node.js runtime can also cause this delay, if event loop is blocked - but five minutes of blocking is less likely. thanks for providing the data that this issue happens more often in Provisioned Concurrency. We've conveyed this information to Lambda Node.js Runtime team.

trivikr avatar Dec 01 '23 23:12 trivikr

A purposeful reproduction for InvalidSignatureException using middleware which adds a delay of 5+ minutes at the end of finalizeRequest step (i.e. between awsAuthMiddleware and handler):

import { SSM } from "@aws-sdk/client-ssm";

const client = new SSM();

const middlewareName = "delaySendingRequestMiddleware";
client.middlewareStack.add(
  (next) => async (args) => {
    const signHeader = args.request.headers["x-amz-date"];
    const year = parseInt(signHeader.substring(0, 4));
    const month = parseInt(signHeader.substring(4, 6)) - 1; // Months are zero-based
    const day = parseInt(signHeader.substring(6, 8));
    const hour = parseInt(signHeader.substring(9, 11));
    const minute = parseInt(signHeader.substring(11, 13));
    const second = parseInt(signHeader.substring(13, 15));
    const signDate = new Date(Date.UTC(year, month, day, hour, minute, second));
    console.log(`Request sign date: ${signDate}\n`);

    console.log(`Before wait: ${new Date()}`);

    const delayInSeconds = 305; // <- change this to below 300 to keep Signature valid.
    console.log(`Wait for ${delayInSeconds} seconds in ${middlewareName}..`);
    await new Promise((resolve) => setTimeout(resolve, delayInSeconds * 1000));

    console.log(`After wait: ${new Date()}\n`);
    return next(args);
  },
  {
    name: middlewareName,
    step: "finalizeRequest",
  }
);

await client.listDocuments({});

Error thrown when delayInSeconds is more than five minutes, i.e. 305:

Request sign date: Fri Dec 01 2023 11:38:18 GMT-0800 (Pacific Standard Time)

Before wait: Fri Dec 01 2023 11:38:18 GMT-0800 (Pacific Standard Time)
Wait for 305 seconds in delaySendingRequestMiddleware..
After wait: Fri Dec 01 2023 11:43:23 GMT-0800 (Pacific Standard Time)

/Users/trivikr/workspace/test/node_modules/@aws-sdk/smithy-client/dist-cjs/default-error-handler.js:8
    const response = new exceptionCtor({
                     ^

InvalidSignatureException: Signature expired: 20231201T193818Z is now earlier than 20231201T193823Z (20231201T194323Z - 5 min.)

No error thrown when delayInSeconds is less than five minutes, i.e. 295:

Request sign date: Fri Dec 01 2023 11:49:50 GMT-0800 (Pacific Standard Time)

Before wait: Fri Dec 01 2023 11:49:50 GMT-0800 (Pacific Standard Time)
Wait for 295 seconds in delaySendingRequestMiddleware..
After wait: Fri Dec 01 2023 11:54:45 GMT-0800 (Pacific Standard Time)

We need to find out who adds this 5+ minutes delay.

trivikr avatar Dec 01 '23 23:12 trivikr

Hey @trivikr @RanVaknin - any further updates regarding this? We had a production incident due to this issue (again).

bri-liu avatar Dec 11 '23 23:12 bri-liu

We too have a lot of issues due to this error 😞 We had to limit some of our scheduled jobs, otherwise were getting this error through every 5 mins. This made it almost impossible to support our system as we were just bombarded with this error. If there isn’t a fix in the pipeline I will have to disable our audit logging functionality but this becomes a compliance issue so I am reluctant to do that!

emmamoinat avatar Dec 12 '23 07:12 emmamoinat

We confirmed that this issue is caused because of the NodeJs Async Init Problem in Lambda. Below is the reproduction of the issue without the SDK.

Code creation

test-aws-lambda-async-init-record

Create a nodejs 20.x function which logs timestamps sent in function URL query, and log server timestamp.

export const handler = async (event) => {
  const queryParams = event.rawQueryString
    .split("&")
    .map((val) => val.split("="))
    .map(([key, value]) => [key, Number(value)]);

  const timestamps = {
    ...Object.fromEntries(queryParams),
    serverTimestamp: Date.now(),
  };

  console.log(timestamps);

  return timestamps;
};

Create function URL with Auth type NONE for this function.

test-aws-lambda-async-init

Create a nodejs 20.x function which logs timestamps in async init Populate value of function URL from previous code in lambdaFunctionUrl below.

import { get } from "https";

let asyncTimestamp;
const syncTimestamp = Date.now();

// Async Init Handler
(async () => {
  asyncTimestamp = Date.now();

  const query = Object.entries({ syncTimestamp, asyncTimestamp })
    .map(([key, value]) => `${key}=${value}`)
    .join("&");

  const lambdaFunctionUrl =
    "https://mueiwlpqyd6svbknhovlebcnbu0iowyo.lambda-url.us-west-1.on.aws/";
  get(`${lambdaFunctionUrl}?${query}`);
})();

export const handler = async () => {
  const timestamps = { syncTimestamp, asyncTimestamp };
  console.log(timestamps);
  return timestamps;
};

For this function:

  • Publish a version
  • Set provisioned concurrency of 1.
  • Set a timer for at least five minutes, after the status for provisioned concurrency says ready.

Code execution

Once the timer it up, invoke test-aws-lambda-async-init and examine logs of both the functions.

Example logs:

test-aws-lambda-async-init

INFO { syncTimestamp: 1702595243217, asyncTimestamp: 1702595243217 }

test-aws-lambda-async-init-record

INFO { syncTimestamp: 1702595243217, asyncTimestamp: 1702595243217, serverTimestamp: 1702595943243 }

Observations

In example logs, the difference between serverTimestamp and asyncTimestamp is 700026. This happens because test-aws-lambda-async-init is executed 700 seconds after provisioned concurrency was ready.

In production code, asyncTimestamp is where signing happens and the get is where http request call is made by the SDK. However, the Lambda infrastructure sends http request only when handler is called. The server (i.e. test-aws-lambda-async-init-record in the example) receives the request when the handler is called. If it's more than five minutes there's a signature mismatch.

We've provided this repro to AWS Lambda Node.js runtime team, and awaiting their response. Refs: JS-4851

trivikr avatar Dec 14 '23 23:12 trivikr

For a workaround, you can move the api call inside the Lambda handler as follows:

Before

// ...
const client = new SSM();
const secret = await client.getSecretValue(params);

const handler = async (event) => {
  // Use the secret.
}

After

// ...
const client = new SSM();
let secret;

const handler = async (event) => {
  if (secret === undefined) {
    secret = await client.getSecretValue(params);
  }
  // Use the secret.
}

This workaround ensures that SDK API calls will not face Node.js Async Init Problem, as the API call is made inside the handler. The API call will also be made just once per function init.

trivikr avatar Dec 15 '23 00:12 trivikr

Hey @trivikr, appreciate the workaround. Nonetheless, this solution is not ideal for me in terms of performance when we are talking with high volume requests. I've done this same example and noticed the average response time increased because of this (compared to the previous v2), therefore I won't be updating yet to v3.

Do you think this issue will be fixed in the short term? Thanks!

gastonlifschitz avatar Jan 02 '24 15:01 gastonlifschitz

Hi, Does anyone have news about this issue? I have the same problem.

osgioia avatar Feb 01 '24 03:02 osgioia

We started encountering this error in production with very low volumes of traffic (a few requests an hour). Our use-case is very basic. The Cloudwatch Client is initialised at the top level of a lambda outside the handler (for re-use, as recommended), and inside our async handler we make the PutMetricCommand. I can only assume that whatever signing is going on happens in the construction of the client and is somehow cached between requests.

Note: we are not using provisioned currency.

deanc avatar Mar 08 '24 19:03 deanc

Building on top of https://github.com/aws/aws-sdk-js-v3/issues/5192#issuecomment-1857021314 we noticed that the timing issue is not reproducible when using ESM with TLA.

test-aws-lambda-async-init

// ...
const getAsyncTimestamp = async () => {
  const asyncTimestamp = Date.now();

  const query = Object.entries({ syncTimestamp, asyncTimestamp })
    .map(([key, value]) => `${key}=${value}`)
    .join("&");

  const lambdaFunctionUrl =
    "https://h4yldxhffdqtjal2tny35siufy0nuozx.lambda-url.us-west-1.on.aws/"; // <-- Replace with your Lambda Function URL.
  
  return new Promise((resolve, reject) => {
    get(`${lambdaFunctionUrl}?${query}`, (res) => {
      res.on("data", (chunk) => {});
      res.on("end", () => resolve(asyncTimestamp));
    }).on("error", (e) => reject(e));
  });
};

const syncTimestamp = Date.now();
const asyncTimestamp = await getAsyncTimestamp(); // Top Level Await
// ...

Code execution

test-aws-lambda-async-init

2024-04-23T16:22:17.475Z	c6659f7f-bc88-4983-837e-272df94d42fe INFO	{
  syncTimestamp: 1713888903277,
  asyncTimestamp: 1713888903277
}

test-aws-lambda-async-init-record

2024-04-23T16:15:04.099Z	c4e93220-02a4-44f8-a077-0cc31a17be29	INFO	{
  syncTimestamp: 1713888903277,
  asyncTimestamp: 1713888903277,
  serverTimestamp: 1713888904099
}

Observations

The log for test-aws-lambda-async-init-record is created when test-aws-lambda-async-init is provisioned, i.e. 2024-04-23T16:15:04.099Z and the difference between serverTimestamp and asyncTimestamp is 822 ms, even though there was 5+ minutes wait between provisioning and invocation.

trivikr avatar Apr 23 '24 19:04 trivikr

The updated recommendation from AWS SDK for JavaScript team is to use Top Level Await (TLA) in your application.

// ...
const client = new SSM();
const secret = await client.getSecretValue(params); // <-- This TLA will not get blocked in Lambda Init stage.

const handler = async (event) => {
  // Use the secret.
}

Although not recommended, you can consider moving API calls inside the handler as mentioned in https://github.com/aws/aws-sdk-js-v3/issues/5192#issuecomment-1857037911 if you're not using TLA.

trivikr avatar Apr 23 '24 19:04 trivikr

const client = new SSM();
let secret;

const handler = async (event) => {
  if (secret === undefined) {
    secret = await client.getSecretValue(params);
  }
  // Use the secret.
}

@trivikr Regarding your suggestion for those of us not using TLA. We experienced this problem doing exactly as you suggest with the CloudWatch SDK (initialising the client outside the handler, and making the call inside of it PutMetricCommand). The example you use here would also cache the secret - meaning, depending how it's used, it might be stale if it's changed during the time the lambda is alive.

deanc avatar Apr 23 '24 20:04 deanc