aws-sdk-js-v3 icon indicating copy to clipboard operation
aws-sdk-js-v3 copied to clipboard

Secrets Manager EPROTO error

Open steelbrain opened this issue 2 years ago • 10 comments

Describe the bug

We're using Secrets Manager to initialize lambda state, and are frequently getting write EPROTO failure messages. It started happening recently after we upgraded from v3.41.0 to v3.58.0

Your environment

SDK version number

@aws-sdk/[email protected]

Is the issue in the browser/Node.js/ReactNative?

Node.js

Details of the browser/Node.js/ReactNative version

Node.js 14.x Lambda :)

Steps to reproduce

Here's tl;dr of the lambda handler code

const { SecretsManagerClient, GetSecretValueCommand } = require('@aws-sdk/client-secrets-manager')

const promiseEnv = new SecretsManagerClient({
  region: process.env.AWS_ENV_SECRET_REGION,
}).send(
  new GetSecretValueCommand({
    SecretId: process.env.AWS_ENV_SECRET_ID,
  })
)

async function handler(event, context) {
  console.log('Requesting environment variables')
  const env = await promiseEnv
  console.log('Got environment variables')
  // ....
}

module.exports = { handler }

Observed behavior

Most of the times, everything works, but then unexpectedly crashes at await promiseEnv, and Got environment variables is never logged

Expected behavior

Secrets Manager would keep working

Screenshots

N/A

Additional context

Here's the raw logs:
[TS] [UUID] INFO	Requesting environment variables
[TS] [UUID] ERROR	Invoke Error 	{"errorType":"Error","errorMessage":"write EPROTO","code":"EPROTO","errno":-71,"syscall":"write","$metadata":{"attempts":1,"totalRetryDelay":0},"stack":["Error: write EPROTO","    at WriteWrap.onWriteComplete [as oncomplete] (internal/stream_base_commons.js:94:16)","    at WriteWrap.callbackTrampoline (internal/async_hooks.js:130:17)"]}
[TS] [UUID] ERROR	(node:9) PromiseRejectionHandledWarning: Promise rejection was handled asynchronously (rejection id: 14)\n(Use `node --trace-warnings ...` to show where the warning was created)
END RequestId: [UUID]
REPORT RequestId: [UUID]	Duration: 33.91 ms	Billed Duration: 34 ms	Memory Size: 1536 MB	Max Memory Used: 99 MB	Init Duration: 1213.34 ms

steelbrain avatar Apr 05 '22 06:04 steelbrain

I got the same issue, "@aws-sdk/client-secrets-manager": "^3.53.0". And I got the same behavior: "Most of the times, everything works, but then unexpectedly crashes". And it crashes at 'await secretsManagerClient.send'

const secretsManagerClient = new SecretsManagerClient({
    credentials: local ? defaultProvider({ profile: AwsProfile }) : undefined,
    region: REGION
});

    static #mySecrets = async (secretName) => {
        let data;
        try {
            data = await secretsManagerClient.send(
                new GetSecretValueCommand({ SecretId: secretName })
            );
            return data; // For unit tests.
        } catch (err) {
            console.log('err', err);
        }
    };

semmgeorge avatar Apr 05 '22 17:04 semmgeorge

Noticing this same behavior on v3.67.0

samkotlove avatar Apr 20 '22 13:04 samkotlove

We have the same error we had serious problems with our production environment since few weeks ago. I switched to env variables and I disabled secretsManager client.

tux86 avatar May 19 '22 20:05 tux86

Same on our live environment. Initially updated @aws-sdk/client-secrets-manager from version 3.20.0 to 3.52.0. Our lambdas started throwing spikes of the following errors at random intervals throughout the day :

`error.code EPROTO
error.errno -71
error.errorMessage write EPROTO
error.errorType Error
error.stack.0 Error: write EPROTO
error.stack.1 at __node_internal_captureLargerStackTrace (internal/errors.js:412:5)
error.stack.2 at __node_internal_errnoException (internal/errors.js:542:12)
error.stack.3 at WriteWrap.onWriteComplete [as oncomplete] (internal/stream_base_commons.js:94:16)
error.syscall write
errorType AwsError
stack.0 AwsError
stack.1 at /var/task/packages/aws/dist/secretsManager/secretsManager.js:11:38
stack.2 at processTicksAndRejections (internal/process/task_queues.js:95:5)
stack.3 at async Promise.all (index 1)
`

Upgraded then to 3.89.0 thinking the issue may have been fixed in the meantime but encountering the same behavior.

Update : downgrading back down to version 3.20.0 seems to have resolved it for now.

ppsmol24 avatar May 20 '22 17:05 ppsmol24

We are also seeing this issue on v3.130.0 and have opted for the work around to downgrade to 3.20.0. Any updates @RanVaknin?

hikarunoryoma avatar Jul 20 '22 16:07 hikarunoryoma

@AllanZhengYP @RanVaknin is this an issue you've seen. We have been seeing it a quite a few times in recent weeks, with a very recent aws-sdk v3.

ERROR	Error: write EPROTO
    at WriteWrap.onWriteComplete [as oncomplete] (internal/stream_base_commons.js:94:16) {
  errno: -71,
  code: 'EPROTO',
  syscall: 'write',
  '$metadata': { attempts: 1, totalRetryDelay: 0 }
}```

jjpepper avatar Jul 26 '22 06:07 jjpepper

@AllanZhengYP @RanVaknin we've investigated this a bit more. It seems that we are seeing the EPROTO error after the lambda times out, and then tries to re-initialise (i.e. we see our cold start code again in the same log group).

jjpepper avatar Jul 26 '22 08:07 jjpepper

We recently began moving a variety of microservices from AWS SDK v2 to v3 and have seen flavors of this error in several repos. Most recently with 3.154.0

sans-jmansfield avatar Aug 31 '22 18:08 sans-jmansfield

Hi All,

Unfortunately Im not able to reproduce this issue. We have multiple issues opened for the same EPROTO error, I tried reproducing with 2 customer examples and never ran into this. I assigned it to the dev team to take a look.

RanVaknin avatar Sep 01 '22 17:09 RanVaknin

One of my colleagues had done some analysis and suspects the issue is due to the clock being momentarily wrong when the lambda starts up.

On Fri, 2 Sep 2022 at 1:35 am, Ran Vaknin @.***> wrote:

Hi All,

Unfortunately Im not able to reproduce this issue. We have multiple issues opened for the same EPROTO error, I tried reproducing with 2 customer examples and never ran into this. I assigned it to the dev team to take a look.

— Reply to this email directly, view it on GitHub https://github.com/aws/aws-sdk-js-v3/issues/3513#issuecomment-1234581982, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQOQYZOFLTOP7PPP5GWKODV4DSOJANCNFSM5SRVDBYA . You are receiving this because you commented.Message ID: @.***>

jjpepper avatar Sep 01 '22 21:09 jjpepper

I too have encountered this over and over and my educated guess is that this happens when some (unrelated) code blocks the event loop a bit too long. AWS services seem to have short timeouts when dealing with connections and the SDK does not retry them, so blocking the JS event loop would delay the connection handling and cause the connection to fail with this error.

Dantemss avatar Oct 17 '22 15:10 Dantemss

Having the same issue.

bdevore17 avatar Oct 19 '22 18:10 bdevore17

@RanVaknin What's the status here? This is crashing mission critical processes for us and its been assigned P1 for over a month...

bdevore17 avatar Oct 19 '22 18:10 bdevore17

@RanVaknin ?????

bdevore17 avatar Oct 26 '22 14:10 bdevore17

On quick revisit during review meeting for issues with p1 labels, we noticed that this issue is likely in Node.js. Search results https://github.com/search?q=repo%3Anodejs%2Fnode+EPROTO&type=issues

We need to find out whether the issue is with the Node.js setup which Lambda follows, or some Node.jsconfiguration which SDK sets, or a bug is Node.js core itself.

The requirement is to provide a minimal repro code which makes multiple secret manager getSecretValue calls. This will help us to log more information, and find out if the issue is specific to Lambda, Node.js or SDK.

For reference, here is a package which attempted to repro npm ping test failure from CodeBuild https://github.com/trivikr/aws-codebuild-npm-ping-test

trivikr avatar Nov 11 '22 22:11 trivikr

Has anyone found that using a newer version of Node makes this issue go away? I am planning on upgrading my version of Node, but I was curious if anyone else has already tried this.

Like the OP i am also using 14.X, but I am planning on updating to 18.X

hikarunoryoma avatar Dec 15 '22 18:12 hikarunoryoma

We are also seeing this error regularly now and wondering if a node upgrade would help - also node 14 and on latest sdk packages we are using when trying to assume role with stsClient.send(assumeRoleCommand)

  • @aws-sdk/client-s3: ^3.235.0 && @aws-sdk/client-sts: ^3.235.0

michaelmrn avatar Dec 22 '22 11:12 michaelmrn

We also see this error fairly frequently with @aws-sdk/client-secrets-manager v3.131.0 and a Node 14.x lambda environment.

It looks like the following issues are closely related which implies it may not exclusively be a secrets manager issue:

  • https://github.com/aws/aws-sdk-js-v3/issues/3219
  • https://github.com/aws/aws-sdk-js-v3/issues/3476

james-m-hall avatar Jan 10 '23 19:01 james-m-hall

Regularly getting this with the SSM client v3.229.0, NodeJS 14. Seems like it's a global issue across many of the clients

dgoemans avatar Jan 17 '23 09:01 dgoemans

Yesterday after posting the above comment, i decided to upgrade my lambdas to Node 16, and so far haven't had this happen. It might be speaking too soon, but @RanVaknin maybe something to pass on to the dev team investigating.

cc @hikarunoryoma (since you asked)

dgoemans avatar Jan 18 '23 11:01 dgoemans

@dgoemans Thanks for the heads up! Looking forward to upgrading my lambdas next month and will follow up if I see success on my end!

hikarunoryoma avatar Jan 18 '23 13:01 hikarunoryoma

We tried Lambda extension for fetching secrets from secrets manager and that has worked quite well https://docs.aws.amazon.com/secretsmanager/latest/userguide/retrieving-secrets_lambda.html

jeeteshchel avatar Feb 07 '23 18:02 jeeteshchel

An upgrade to Node18 appears to have resolved this for us

michaelmrn avatar Mar 02 '23 09:03 michaelmrn

Indeed, 6 weeks after upgrading to Node 16 we haven't seen the issue again. Seems to be Node 14 only.

dgoemans avatar Mar 02 '23 09:03 dgoemans

I updated from Node 14 -> Node 18 and no longer see this issue! Agreed that this is some issue with Node interfacing with the latest AWS sdk

hikarunoryoma avatar May 05 '23 15:05 hikarunoryoma

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.

github-actions[bot] avatar Jul 12 '23 00:07 github-actions[bot]