up icon indicating copy to clipboard operation
up copied to clipboard

"Error: write EPIPE" on lambda errors (it worsens with provisioned concurrency)

Open gagoar opened this issue 5 years ago • 8 comments

Prerequisites

  • [x] I am running the latest version. (up upgrade)
  • [X] I searched to see if the issue already exists.
  • [X] I inspected the verbose debug output with the -v, --verbose flag.
  • [X] Are you an Up Pro subscriber?

Description

We have been using Up for our lambda solution (express server) for a while now (~2 years) and we were seeing some errors like this (but not that frequently):

{"timestamp":1580983517551,"message":"ERROR\tUncaught Exception\t{\"errorType\":\"Error\",\"errorMessage\":\"write EPIPE\",\"code\":\"EPIPE\",\"errno\":\"EPIPE\",\"syscall\":\"write\",\"stack\":[\"Error: write EPIPE\",\"    at WriteWrap.afterWrite [as oncomplete] (net.js:789:14)\"]}","extractedFields":{"event":"ERROR\tUncaught Exception\t{\"errorType\":\"Error\",\"errorMessage\":\"write EPIPE\",\"code\":\"EPIPE\",\"errno\":\"EPIPE\",\"syscall\":\"write\",\"stack\":[\"Error: write EPIPE\",\"    at WriteWrap.afterWrite [as oncomplete]

to provide a bit more context, we also get an odd message coming almost an hour after this error shows up. image

We decided to add provisioned concurrency and noticed that this error started happening more often.

Currently, this problem is producing 500 to some of our consumers.

Steps to Reproduce

not sure I can describe this in a generic way but we see it quite often with or without provisioned concurrency.

Love Up?

We are using Up pro currently.

gagoar avatar Feb 11 '20 00:02 gagoar

Hmm thewaiting for .... to be in a listening state timeout should only appear if your application isn't listening on PORT, even if it's only in some Lambda's. Do you connect to a database or similar before allowing connections?

tj avatar Feb 11 '20 10:02 tj

No, we don't establish any database connection. this code only deals with connections (REST) when queried (graphQL query); so not at start time.

gagoar avatar Feb 11 '20 18:02 gagoar

additional context; this Lambda is running in a VPC, and we have two scheduled benchmarking runs that occur every 10 and 30 minutes, respectively. currently, that's the only source of traffic to the Lambda.

we're also running apollo-server-express in this Lambda

jnwng avatar Feb 11 '20 18:02 jnwng

What do you have configured for the lambda memory size? You probably already know this, but with lower memory limits such as 128mb the CPU is also limited, so having a lot of require()s can impact cold starts.

15s is pretty long though even in that case, I would start by throwing a few console.log()s in the app just to make sure it's reaching your http server's .listen(PORT) call.

tj avatar Feb 12 '20 12:02 tj

for the vast majority of invocations, we've been noticing roughly ~5s of initialization, so had initially ruled that out. there's no harm in bumping it, though, even if it's temporary.

re: getting to the listen(PORT) call, it is still quite strange that we see this intermittently, but given that our listen(PORT) call is guarded by a branch, i think it's entirely possible that we're just never getting to the listen:

  if (env.NODE_ENV !== NODE_ENV_TEST) {
    server = app.listen({ port: env.PORT }, (): void => {
      logger.info(`🚀 Server ready at http://localhost:${env.PORT}/playground`);
    });
  }

(where env.NODE_ENV is a filtered version of process.env validated by envalid)

will do both and report back

jnwng avatar Feb 12 '20 16:02 jnwng

Interesting yea, really strange that it's intermittent. Regarding cold-start time you could try a bundler, just eliminating the requires alone seems to help quite a bit, but let me know if you find anything suspicious!

tj avatar Feb 13 '20 11:02 tj

Hi @tj, as a follow-up, we upgraded the lambda to 12.x and the errors regarding the "Uncaught Exception" went away. We are still giving it till the end of the week to make sure we are in the clear tho.

gagoar avatar Feb 13 '20 22:02 gagoar

Nice! Glad you found it, sounds good though I'll close in a few days if it seems like things are good to go

tj avatar Feb 15 '20 11:02 tj