stripe-node Intermittent Error: write EPIPE when running stripe client in AWS Lambda

We're using the stripe node client 8.71.0 on an AWS Lambda running node 12.x. A stripe customers.list call is called first thing when the lambda executes. 33% of the time - we get this error on that call. It consistently happens so does not seem to be transient.

I did read https://github.com/stripe/stripe-node/issues/650, and setting maxNetworkRetries in stripe to 2 seems to resolve the issue. However it seems that just masks the issue.

Is this a stripe issue or AWS Lambda issue? Probably lambda, I submitted a request with AWS. But putting this here in case others run into it.

2020-10-13T12:02:58.032Z c184006d-fe96-490a-9bfe-696b8271769a ERROR StripeConnectionError: An error occurred with our connection to Stripe. at /var/task/node_modules/stripe/lib/StripeResource.js:234:9 at ClientRequest. (/var/task/node_modules/stripe/lib/StripeResource.js:489:67) at ClientRequest.emit (events.js:315:20) at ClientRequest.EventEmitter.emit (domain.js:483:12) at TLSSocket.socketErrorListener (_http_client.js:426:9) at TLSSocket.emit (events.js:315:20) at TLSSocket.EventEmitter.emit (domain.js:483:12) at emitErrorNT (internal/streams/destroy.js:92:8) at emitErrorAndCloseNT (internal/streams/destroy.js:60:3) at processTicksAndRejections (internal/process/task_queues.js:84:21) { type: 'StripeConnectionError', raw: { message: 'An error occurred with our connection to Stripe.', detail: Error: write EPIPE at WriteWrap.onWriteComplete [as oncomplete] (internal/stream_base_commons.js:92:16) at writevGeneric (internal/stream_base_commons.js:132:26) at TLSSocket.Socket._writeGeneric (net.js:784:11) at TLSSocket.Socket._writev (net.js:793:8) at doWrite (_stream_writable.js:401:12) at clearBuffer (_stream_writable.js:519:5) at TLSSocket.Writable.uncork (_stream_writable.js:338:7) at ClientRequest.end (_http_outgoing.js:774:17) at ClientRequest. (/var/task/node_modules/stripe/lib/StripeResource.js:506:15) at Object.onceWrapper (events.js:422:26) { errno: 'EPIPE', code: 'EPIPE', syscall: 'write' } }, rawType: undefined, code: undefined, doc_url: undefined, param: undefined, detail: Error: write EPIPE at WriteWrap.onWriteComplete [as oncomplete] (internal/stream_base_commons.js:92:16) at writevGeneric (internal/stream_base_commons.js:132:26) at TLSSocket.Socket._writeGeneric (net.js:784:11) at TLSSocket.Socket._writev (net.js:793:8) at doWrite (_stream_writable.js:401:12) at clearBuffer (_stream_writable.js:519:5) at TLSSocket.Writable.uncork (_stream_writable.js:338:7) at ClientRequest.end (_http_outgoing.js:774:17) at ClientRequest. (/var/task/node_modules/stripe/lib/StripeResource.js:506:15) at Object.onceWrapper (events.js:422:26) { errno: 'EPIPE', code: 'EPIPE', syscall: 'write' }, headers: undefined, requestId: undefined, statusCode: undefined, charge: undefined, decline_code: undefined, payment_intent: undefined, payment_method: undefined, setup_intent: undefined, source: undefined }

Oct 13 '20 17:10 hisham

We've seen this before with AWS Lambda and believe it's an issue/configuration setting on their end. Using maxNetworkRetries seems to do the trick in most cases, but as you correctly stated it's more masking the problem than solving it.

When you hear back from AWS would you mind updating this issue with your findings?

Oct 14 '20 01:10 paulasjes-stripe

Yea I have aws premium subscription should have a response soon.

I did find similar issues that people reported here with other libs:

https://github.com/aws/aws-sdk-js-v3/issues/1196
https://forums.aws.amazon.com/thread.jspa?messageID=927096

So my latest theory is it's something related to keep-alive and sockets expiring, but at this point I added the retry and waiting for AWS to respond back to me.

Oct 14 '20 02:10 hisham

Hi @paulasjes-stripe - here's the response we got from AWS:

Starting with the error, "EPIPE" error [0] is generally caused when data is piped into closed streams [1]. In the case of the NodeJS Lambda function, the error might be caused when the NodeJS event loop didn't clean-up closed TCP connections from the HTTP connection pool and then the NodeJS runtime attempted to use the closed TCP connection.

To understand the error better, below is what happens behind the scenes:

AWS Lambda function runs in an isolated container and usually each Invoke starts a new Lambda function execution in a new container.

However, if delay between two requests is very small, then the container used by the previous Invoke might be reused to cater to the later request as well. This is known as container reuse [2].

While finishing execution, Lambda does not consider the state of active processes in background other than handler function. Thus, when the execution is finished, the active processes turn into frozen state.

When the next request is processed by the container, the previously frozen asynchronous processes are started again.

If any of the frozen processes has dependency on the piping/streaming, then that process fails to continue execution as it does not find the pipeline/connection/stream it used in previous request.

To avoid these errors the following is suggested:

Revisit the function code and ensure that the processes (dependent on connection/stream) are finished before lambda completes execution.

Use the retry which will create new connection/stream for new request.

I hope the above information gives an idea on EPIPE errors and why adding retries may help in resolving the EPIPE errors.

However, If there are any further queries/concerns please let me know and I will be happy to assist.

References: [0] https://nodejs.org/api/errors.html [1] EPIPE error - https://github.com/nodejs/node/issues/947 [2]https://aws.amazon.com/blogs/compute/container-reuse-in-lambda/

So I'll just use Stripe's retry logic for now, as I don't seem to have control over stripes background processes. Is it the keep alive connection that is causing this issue? Not sure.

Our lambda is very simple, it basically just returns the results from this line:

await this.stripeClient.customers.list({ email })

It's a 2048mb lambda running nodejs 12. It is called via a GraphQL function transfomer (https://docs.amplify.aws/cli/function), but I don't think those details matter much.

Interestingly, I have other lambdas that also call the above rest API, but have other network calls and involved logic, and I've never ran into the EPIPE issue with them before.

Oct 14 '20 17:10 hisham

Thanks @hisham! We're going to look into this to see if there's anything that can be done from our end, but it looks like maxNetworkRetries are a suitable workaround for now.

Oct 14 '20 23:10 paulasjes-stripe

Great. Yes maxNetworkRetries does the job. AWS seems to agree with me that calling destroy method on the httpagent before the lambda exists will probably also resolve this issue:

It is mentioned in AWS Lambda Best Practices [1][2] to use a keep-alive directive to maintain persistent connections. Quoting from Documentation Lambda purges idle connections over time. Attempting to reuse an idle connection when invoking a function will result in a connection error. To maintain your persistent connection, use the keep-alive directive associated with your runtime.

However, In certain situations depending upon the time difference between 2 lambda invocations there might be chances of getting an idle connection present there and causing error.

Therefore, It sounds right to use agent.destroy() before exiting lambda to destroy all connections. But It needs to be made sure that the code to close/destroy all connections is executed before exiting lambda. Then, This would ensure that the socket connections are not hanging in there open.

As a workaround, Retries as you mentioned and have found to be working fine.

I hope this information helps. However, If there are any further queries please let me know.

[1] https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html [2] https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/node-reusing-connections.html

Oct 16 '20 02:10 hisham

@hisham Do you happen to know if I can wrap my stripe method calls inside trycatch, if I want to use maxNetworkRetries? I'm also using aws lambdas, and I'm worried that it will prematurely exit in that case...

Nov 08 '20 11:11 huntedman

@huntedman we are using maxNetworkRetries and are not wrapping calls around try catch. Stripe seems to handle this stuff internally.

Nov 08 '20 17:11 hisham

Hi @hisham sorry for the radio silence about this recently but I'm checking in with a quick update. The response from AWS was very helpful to us (thank you!) and we're actively investigating this issue to provide a better fix than our suggested workaround. When we know more we'll definitely update you here again via this open issue. Thank you for your patience!

Dec 18 '20 17:12 suz-stripe

I've spent some time experimenting with AWS lambda, and have a better understanding of these errors. They are happening due to the interaction between

how Lambda freezes/unfreezes processes
and how stripe-node (by default) uses a single http Agent with keep-alive enabled

In case you're not familiar, keep-alive is a way for http clients like stripe-node to be more efficient when your application is making multiple requests to Stripe. Rather than making a new connection for each request, which has a performance cost, it keeps the connection to the server open after a request is finished, so that it can be reused on the next request. In order for keep-alive to work, the open connection must ping the server every so often to let the server know that it is still active. If it doesn't, the server will assume the connection isn't active anymore and close the connection to make room for others.

The problem arises when Lambda freezes your Node process. While the process is frozen, the TCP connections can't ping the server to remain active, and the server closes them. When Lambda unfreezes your process, Node isn't aware that the connections have been closed, and it attempts to re-use them. As soon as it does, it gets EPIPE or ECONNRESET.

One option for eliminating these errors would be to disable keep-alive when you initialize stripe-node.

const https = require('https')
const stripe = require('stripe')('sk_live_xyz', {httpAgent: new https.Agent({keepAlive: false})})

This does mean sacrificing the benefits of keep-alive, but I expect that's an acceptable trade-off especially for low-traffic lambdas.

Another possibility would be initializing a new Stripe client with its own keep-alive-enabled agent inside the Lambda handler. This is roughly equivalent to Amazon's suggestion of calling .destroy on the http agent before exiting, but this isn't ideal either because it only allows you to re-use connections within each individual Lambda invocation, and not from one Lambda invocation to the next.

From my perspective, handling these errors by retrying is likely the proper approach, and shouldn't necessarily be viewed as a workaround, or masking an underlying issue, because it is expected/unavoidable that these broken connections will come to exist, and there doesn't seem to be an obvious way of asking Node "how long has it been since the last keep-alive probe on this connection" besides writing to the connection and triggering the error.

At the same time, I think we should look into the possibility of making stripe-node handle errors like this by default/more transparently, so that users don't have to configure the retries themselves. That seems to be what Amazon started doing for errors like this for their own SDKs about a month ago (thank you @hisham for linking to that issue, by the way).

Anyway I hope this clarifies things and we'll keep you posted.

Dec 19 '20 02:12 richardm-stripe

Thank you for opening this thread! I had the same issue on a very low traffic site (side project). I used Stripe's Node library inside Netlify functions, and got 502 errors with error message write EPIPE in the Netlify function logs. .

I moved forward with the fix you recommended @richardm-stripe, but the syntax didn't work. The below worked though:

const stripe = require('stripe')('secret_key_xyz', {
  httpAgent: new https.Agent({keepAlive: false})
});

Feb 09 '21 13:02 theoBLT

Thanks @theoBLT, I've corrected the syntax in the original comment.

Feb 09 '21 17:02 richardm-stripe

Oof, #1336 claimed to fix this, so it auto-closed, but I disagree that it's entirely fixed until retries are enabled by default.

May 09 '22 23:05 richardm-stripe

I also got this issue in v8, I upgraded to v9 and all looks good now

automatically retry it is place now for CONNECTION_CLOSED_ERROR_CODES --> https://github.com/stripe/stripe-node/commit/47776ef15ef3590980c48d6dece1ea54657fdbca

Jul 24 '22 08:07 FeliceGeracitano

Hello! maxNetworkRetries has been set to 1 by default with the release of stripe-node v13 today (enabled by this change). I'll be closing this issue, as the default behavior in v13 should prevent this error.

Aug 16 '23 21:08 anniel-stripe

stripe-node stripe-node copied to clipboard

Intermittent Error: write EPIPE when running stripe client in AWS Lambda

stripe-node
stripe-node copied to clipboard