vercel icon indicating copy to clipboard operation
vercel copied to clipboard

Fetch failed with UND_ERR_CONNECT_TIMEOUT error on Next.js serverless function on Vercel Production env

Open andremendonca03 opened this issue 1 year ago • 35 comments

TLDR; When executing a fetch request from a serverless function it sometimes fails returning a UND_ERR_CONNECT_TIMEOUT on a Nex.js production environment hosted on Vercel.

I currently have a Next.js site (v14.1.4 - Pages router) running on the Vercel platform (Node v20x) that performs fetch requests to the Slack API from an API route through a serverless function. This is a breakdown of what happens before the error:

  1. From a client-side component, start a fecth POST request to an API endpoint (route handler) on form submission;
  2. On an API serverless function realise another fetch POST request to an external API (in my case I used Slack message API - "https://slack.com/api/chat.postMessage");
  3. On a production environment hosted on Vercel, around 70% of the requests to Slack are working while another 30% fail returning 500 server error code "UND_ERR_CONNECT_TIMEOUT".

Full error message: Unhandled Rejection: TypeError: fetch failed at node:internal/deps/undici/undici:12345:11 at process.processTicksAndRejections (node:internal/process/task_queues:95:5) { cause: ConnectTimeoutError: Connect Timeout Error at onConnectTimeout (node:internal/deps/undici/undici:7492:28) at node:internal/deps/undici/undici:7448:50 at Immediate._onImmediate (node:internal/deps/undici/undici:7480:13) at process.processImmediate (node:internal/timers:478:21) at process.callbackTrampoline (node:internal/async_hooks:130:17) { code: 'UND_ERR_CONNECT_TIMEOUT' } } Node.js process exited with exit status: 128. The logs above can help with debugging the issue.

IMPORTANT NOTE: I've seen this issue happening in both pages and app routers but more frequently on the pages router. Also, it only happens on production environments hosted on Vercel. I could not (personally) reproduce from my local server nor dev environments.

Reproducible By

This codepen contains an example of the code being used to trigger the fetch requests coming from the input form submission to the serverless function API call. https://codesandbox.io/p/devbox/gifted-shirley-mzlvgy?file=%2Fapp%2Fapi%2Froute.js%3A1%2C1-39%2C1

Expected Behaviour

Currently some external API calls from a serverless function are returning undici unhandled fetch errors. Expected behaviour is no errors being returned and API call succeeding every time.

Environment

Operating System: Vercel Servers

Binaries: Node: v20x (default vercel v20 setting) npm: 10.2.3 yarn: 1.22.19 build command: yarn build

Relevant Packages: next: 14.1.4 react: 18.2.0 react-dom: 18.2.0

Additional Context

Additional information about the issue and more cases can be found at: https://github.com/vercel/next.js/discussions/57384 https://github.com/vercel/next.js/issues/66373

andremendonca03 avatar Jun 04 '24 16:06 andremendonca03

I am also getting this error but for my project it is on build time. And it only happens in vercel, no issues in local. The API times are also not unreasonable i.e. 2 seconds per API call at max. (response time is local). Not sure how I can measure in vercel at build time. image

Nextjs version is 13.1.6 and I've tried Node 18x and 20x + node_options image

kpratik2015 avatar Jun 05 '24 17:06 kpratik2015

Seems I'm encountering the same thing - no issues locally - timeouts on Vercel.

Running next 14.2.3

zackproser avatar Jun 05 '24 18:06 zackproser

Which external APIs are you trying to fetch specifically and which method are you using? @zackproser @kpratik2015

andremendonca03 avatar Jun 05 '24 22:06 andremendonca03

Which external APIs are you trying to fetch specifically and which method are you using? @zackproser @kpratik2015

@andremendonca03 using internal API, no third party API. Our API is GraphQL but it used to work without any issue in past on vercel.

kpratik2015 avatar Jun 06 '24 09:06 kpratik2015

@andremendonca03 I was able to resolve my problem by setting --no-experimental-fetch in NODE_OPTIONS environment variable in vercel. It also helped me to get better error log which required me to increase timeout in next.config.js as staticPageGenerationTimeout: 1000,

kpratik2015 avatar Jun 06 '24 15:06 kpratik2015

I'm having the same issue, only on prod, and only sporadically. Is there a fix?

nachodeh avatar Jun 07 '24 06:06 nachodeh

Similar sporadic issue; only on vercel, and never able to reproduce locally:

Node.js process exited with exit status: 128. The logs above can help with debugging the issue.

Previously happened when my db (supabase) calls a Vercel endpoint, which could take longer than 5s (max timeout on the webhook). My guess was that supabase closed the connection before the serverless function could fully execute.

Now it's happening again, and I'm 100% sure the call is taking less than 5 seconds. We should consider reaching out to support at this point

lookevink avatar Jun 09 '24 23:06 lookevink

@andremendonca03 I was able to resolve my problem by setting --no-experimental-fetch in NODE_OPTIONS environment variable in vercel. It also helped me to get better error log which required me to increase timeout in next.config.js as staticPageGenerationTimeout: 1000,

@kpratik2015 is you set --no-experimental-fetch do you also need to import node-fetch?

We're having the same problem (calling external API from route handler causes undici timeout) and reached out to support. This just started happening by itself and IMO has something to do with Vercel networking or NextJS node version.

rafalzawadzki avatar Jun 10 '24 10:06 rafalzawadzki

@andremendonca03 I was able to resolve my problem by setting --no-experimental-fetch in NODE_OPTIONS environment variable in vercel. It also helped me to get better error log which required me to increase timeout in next.config.js as staticPageGenerationTimeout: 1000,

@kpratik2015 is you set --no-experimental-fetch do you also need to import node-fetch?

No, I've made no code changes (apart from config change I mentioned) and we continue to simply use fetch with Node v20 set in Vercel.

@PratikKataria-plivo how do you set the env variable on vercel? Is it just a new variable name NODE_OPTIONS and value --no-experimental-fetch?

nachodeh avatar Jun 10 '24 15:06 nachodeh

@andremendonca03 I was able to resolve my problem by setting --no-experimental-fetch in NODE_OPTIONS environment variable in vercel. It also helped me to get better error log which required me to increase timeout in next.config.js as staticPageGenerationTimeout: 1000,

@kpratik2015 is you set --no-experimental-fetch do you also need to import node-fetch?

No, I've made no code changes (apart from config change I mentioned) and we continue to simply use fetch with Node v20 set in Vercel.

@PratikKataria-plivo how do you set the env variable on vercel? Is it just a new variable name NODE_OPTIONS and value --no-experimental-fetch?

Yup, in project settings -> Environment Variables

image

kpratik2015 avatar Jun 10 '24 15:06 kpratik2015

I am also getting a bunch of undici errors as well recently. These are the 3 main ones

"next": "^14.2.3", node v20.9.0

I also tried setting vercel env variable: NODE_OPTIONS=--dns-result-order=ipv4first but it has not solved the issue

`TypeError: fetch failed
    at node:internal/deps/undici/undici:12618:11
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  cause: Error: connect ETIMEDOUT 76.76.21.241:443
      at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1555:16)
      at TCPConnectWrap.callbackTrampoline (node:internal/async_hooks:128:17) {
    errno: -110,
    code: 'ETIMEDOUT',
    syscall: 'connect',
    address: '76.76.21.241',
    port: 443
  }`
  

  `TypeError: fetch failed
    at node:internal/deps/undici/undici:12618:11
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  cause: ConnectTimeoutError: Connect Timeout Error
      at onConnectTimeout (node:internal/deps/undici/undici:7760:28)
      at node:internal/deps/undici/undici:7716:50
      at Immediate._onImmediate (node:internal/deps/undici/undici:7748:13)
      at process.processImmediate (node:internal/timers:476:21)
      at process.callbackTrampoline (node:internal/async_hooks:128:17) {
    code: 'UND_ERR_CONNECT_TIMEOUT'
  }
}`

`TypeError: fetch failed
    at node:internal/deps/undici/undici:12618:11
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  cause: [Error: C0AFB780CE7F0000:error:0A00010B:SSL routines:ssl3_get_record:wrong version number:ssl/record/ssl3_record.c:355:
  ] {
    library: 'SSL routines',
    reason: 'wrong version number',
    code: 'ERR_SSL_WRONG_VERSION_NUMBER'
  }
}`

ngroenewold95 avatar Jun 10 '24 15:06 ngroenewold95

I got a response from Vercel support, here's an excerpt:

Looking at your Runtime logs, I see both Edge Functions and Serverless Functions experience 504 issues.

Without going into details, I can confirm the two runtimes use different providers, so it's unlikely to be a Vercel platform issue. I also couldn't find similar reports from other customers, which would indicate that the issue may be with your backend.

Looking at your (redacted) Serverless Function, over the past 24 hours, there were over 1000 successful invocations and 94 timeouts.

Since the issue is intermittent and this function typically resolves within 1 second, you may by able to work around the issue by implementing a retry strategy where you abort and retry the POST request if a response from your backend hasn't been received in 2 seconds.

You can also try to implement the different workarounds suggested in the Github issues below:

https://github.com/vercel/vercel/issues/11692#issuecomment-2152859828 https://github.com/vercel/next.js/issues/66373#issuecomment-2148546390

They are basically looping back into this thread :) I tried both flags: --no-experimental-fetch and --dns-result-order=ipv4first but with the first the build fails and second doesn't seem to do anything.

rafalzawadzki avatar Jun 10 '24 18:06 rafalzawadzki

They are basically looping back into this thread :) I tried both flags: --no-experimental-fetch and --dns-result-order=ipv4first but with the first the build fails and second doesn't seem to do anything.

I have had the same result with both of those flags

ngroenewold95 avatar Jun 10 '24 18:06 ngroenewold95

What build error are you guys getting with --no-experimental-fetch ? Because only after this flag I was able to get sensible error which pointed me to https://nextjs.org/docs/messages/page-data-collection-timeout which mentions: Increase the timeout by changing the config.staticPageGenerationTimeout configuration option (default 60 in seconds).

image

kpratik2015 avatar Jun 10 '24 19:06 kpratik2015

I'm getting a build error: headers not found

On Mon, 10 Jun 2024, 20:29 Pratik Kataria, @.***> wrote:

What build error are you guys getting with --no-experimental-fetch ? Because only after this flag I was able to get sensible error which pointed me to https://nextjs.org/docs/messages/page-data-collection-timeout which mentions: Increase the timeout by changing the config.staticPageGenerationTimeout configuration option (default 60 in seconds).

image.png (view on web) https://github.com/vercel/vercel/assets/14140930/a0622f5a-741e-44a1-b276-2e4becb94738

— Reply to this email directly, view it on GitHub https://github.com/vercel/vercel/issues/11692#issuecomment-2159136025, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAM44FNNPE6V3O5GNZWSJKLZGX5CVAVCNFSM6AAAAABIY4K6QCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJZGEZTMMBSGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

nachodeh avatar Jun 10 '24 19:06 nachodeh

I'm getting a build error: headers not found

Same here:

CleanShot 2024-06-10 at 9  42 55@2x

I wonder if this may be because we're on older Next/Node version? Running [email protected] and node 18.x

rafalzawadzki avatar Jun 10 '24 19:06 rafalzawadzki

This is what happens when I compile with --no-experimental-fetch

[08:40:08.082] unhandledRejection ReferenceError: Headers is not defined
[08:40:08.083]     at Object.<anonymous> (/vercel/path0/node_modules/next/dist/server/web/spec-extension/adapters/headers.js:32:30)
[08:40:08.083]     at Module._compile (node:internal/modules/cjs/loader:1369:14)
[08:40:08.083]     at Module._extensions..js (node:internal/modules/cjs/loader:1427:10)
[08:40:08.083]     at Module.load (node:internal/modules/cjs/loader:1206:32)
[08:40:08.083]     at Module._load (node:internal/modules/cjs/loader:1022:12)
[08:40:08.083]     at Module.require (node:internal/modules/cjs/loader:1231:19)
[08:40:08.083]     at mod.require (/vercel/path0/node_modules/next/dist/server/require-hook.js:65:28)
[08:40:08.083]     at require (node:internal/modules/helpers:179:18)
[08:40:08.083]     at Object.<anonymous> (/vercel/path0/node_modules/next/dist/server/api-utils/index.js:67:18)
[08:40:08.083]     at Module._compile (node:internal/modules/cjs/loader:1369:14)
[08:40:08.103] Error: Command "npm run build" exited with 1
[08:40:08.791] 

I am using Next.js version: 14.2.3 and Node 20.x is enabled in vercel

ngroenewold95 avatar Jun 10 '24 19:06 ngroenewold95

Mine was Nextjs 13.x and I set Nodejs 20 in Vercel settings. So try Node v20. Similar I found here https://stackoverflow.com/questions/77594693/unhandledrejection-referenceerror-headers-is-not-defined-building-next-js-14 Or check node-fetch doc for clues https://stackoverflow.com/a/65766305

kpratik2015 avatar Jun 10 '24 19:06 kpratik2015

Since the issue is intermittent and this function typically resolves within 1 second, you may by able to work around the issue by implementing a retry strategy where you abort and retry the POST request if a response from your backend hasn't been received in 2 seconds.

@rafalzawadzki Do you have a reference snippet to abort and retry the fetch POST after 2 seconds? This is not a solution, the request can still fails twice but can be a prevention to reduce the number of errors.

andremendonca03 avatar Jun 11 '24 01:06 andremendonca03

Since the issue is intermittent and this function typically resolves within 1 second, you may by able to work around the issue by implementing a retry strategy where you abort and retry the POST request if a response from your backend hasn't been received in 2 seconds.

@rafalzawadzki Do you have a reference snippet to abort and retry the fetch POST after 2 seconds? This is not a solution, the request can still fails twice but can be a prevention to reduce the number of errors.

the link provided by Vercel: https://developer.mozilla.org/en-US/docs/Web/API/AbortController/abort (it's a standard Web API feature)

rafalzawadzki avatar Jun 11 '24 07:06 rafalzawadzki

@kpratik2015 however hard I try I can't seem to be able to use --no-experimental-fetch flag - it just results in all sorts of build errors. I tried with different node and next versions, in prod and locally - no go.

sorry to doubt you, but have you made sure to re-deploy your project after adding this env variable to make sure it's used?

also not sure if it's a lead at all, but apparently AWS Lambda recently rolled out an upgrade to Node version with a breaking undici change that causes bugs in Netlify, Vercel, Sentry etc: https://github.com/nodejs/node/issues/53186. I wonder if that may have something to do with our issue

rafalzawadzki avatar Jun 11 '24 08:06 rafalzawadzki

Hi, we're looking at the issue but don't have any updates just yet. Any minimal reproduction that doesn't depend on hitting an API behind authentication could help a lot 🙏

https://github.com/nodejs/node/issues/53186 is very likely unrelated to this issue, since it's only about Node.js 20 while this current discussion is affecting both Node.js 18 and Node.js 20.

QuiiBz avatar Jun 11 '24 08:06 QuiiBz

@rafalzawadzki Ya I did. It's pretty old project and we only use GraphQL API so maybe I am not running into problems with other variations of fetch usage. Also, each API response in successful builds show max. response time as under 1 second. Initially I tried all possible values of --dns-result-order but that didn't help. And locally everything works without any change. Only in Vercel environment it was failing everytime.

image

If it helps, this is the only way we are using fetch in getStaticProps:

type StaticFetchParams<V> = {
  query: string;
  variables?: V;
};

export const staticFetch = async <V, TData>(
  params: StaticFetchParams<V>,
  headers?: HeadersInit
) => {
  const response = await fetch(process.env.NEXT_PUBLIC_REACT_APP_API, {
    method: "POST",
    headers: headers || {
      "Content-Type": "application/json",
    },
    body: JSON.stringify(params),
  });
  return response.json();
};

kpratik2015 avatar Jun 11 '24 09:06 kpratik2015

So this issue comes and goes for us.... It started up again last week, which seems to match this thread. Here is a comparison of the number of UND_ERR_CONNECT_TIMEOUT's we've gotten vs the general activity of our platform.

CleanShot 2024-06-11 at 11 00 45@2x

This is in production, on Vercel. These seem to mostly be internal calls, like GET /api/auth/session as well as webhooks.

ThePerryR avatar Jun 11 '24 15:06 ThePerryR

I'm experiencing this same issue on a brand new app that I imported into Vercel this morning. It fails to make a fetch to the QStash API (which is authenticated) in order to push a task. There was about 30 minutes where it started working. But outside of that, has not worked. The fetch is being made from a server component.

Next 14.2.3 Node 20.x

bttf avatar Jun 11 '24 22:06 bttf

Experienced same issue. Used webhook, which times out in 1s. In my case when the wh closes the connection, I get the error.

lookevink avatar Jun 14 '24 19:06 lookevink

Some updates:

  • We landed a network improvement for builds to mitigate timeout errors at build time
  • We believe that the issue at runtime is caused by a race condition regarding keep-alive settings in Undici, the library used by Node.js to provide the fetch() method. You can add a connection: 'close' header to your fetch() calls to disable keep-alive, which should help mitigating this issue

QuiiBz avatar Jun 16 '24 07:06 QuiiBz

I successfully resolved the issue by configuring the undici global dispatcher in the root layout. CleanShot 2024-06-17 at 20 03 50@2x

bacqueyrisses avatar Jun 17 '24 18:06 bacqueyrisses

My application is using axios to send requests. To my knowledge, axios does not send a keepAlive so I'm not sure the recommendation helps. In our vercel logs we are seeing:

We are on Node 18.

Unhandled Rejection: TypeError: fetch failed
at node:internal/deps/undici/undici:12618:11
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async h.send (/var/task/apps/frontend/.next/server/chunks/1542.js:1:73006) {
cause: ConnectTimeoutError: Connect Timeout Error
at onConnectTimeout (node:internal/deps/undici/undici:7760:28)
at node:internal/deps/undici/undici:7716:50
at Immediate._onImmediate (node:internal/deps/undici/undici:7748:13)
at process.processImmediate (node:internal/timers:476:21)
at process.topLevelDomainCallback (node:domain:161:15)
at process.callbackTrampoline (node:internal/async_hooks:126:24) {
code: 'UND_ERR_CONNECT_TIMEOUT'
}
}
Node.js process exited with exit status: 128. The logs above can help with debugging the issue.

abhiaiyer91 avatar Jun 19 '24 21:06 abhiaiyer91

I'm getting more and more of these errors. Also using axios for requests. Can someone at vercel please prioritize this?

On Wed, 19 Jun 2024 at 22:09, Abhi Aiyer @.***> wrote:

My application is using axios to send requests. To my knowledge, axios does not send a keepAlive so I'm not sure the recommendation helps. In our vercel logs we are seeing:

We are on Node 18.

Unhandled Rejection: TypeError: fetch failed at node:internal/deps/undici/undici:12618:11 at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async h.send (/var/task/apps/frontend/.next/server/chunks/1542.js:1:73006) { cause: ConnectTimeoutError: Connect Timeout Error at onConnectTimeout (node:internal/deps/undici/undici:7760:28) at node:internal/deps/undici/undici:7716:50 at Immediate._onImmediate (node:internal/deps/undici/undici:7748:13) at process.processImmediate (node:internal/timers:476:21) at process.topLevelDomainCallback (node:domain:161:15) at process.callbackTrampoline (node:internal/async_hooks:126:24) { code: 'UND_ERR_CONNECT_TIMEOUT' } } Node.js process exited with exit status: 128. The logs above can help with debugging the issue.

— Reply to this email directly, view it on GitHub https://github.com/vercel/vercel/issues/11692#issuecomment-2179458591, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAM44FMY4O5IDWCJY2RLTKDZIHXQDAVCNFSM6AAAAABIY4K6QCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZZGQ2TQNJZGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

nachodeh avatar Jun 19 '24 22:06 nachodeh