next-axiom icon indicating copy to clipboard operation
next-axiom copied to clipboard

Log ingestion FetchError

Open JannikWempe opened this issue 2 years ago • 16 comments

Hi folks 👋🏼

We are using [email protected]. Hosting on Vercel with the Axiom integration installed.

We are seeing a lot of FetchErrors (reason: write EPROTO 139882530146240:error:1408F10B:SSL routines:ssl3_get_record:wrong version number:../deps/openssl/openssl/ssl/record/ssl3_record.c:331) when sending to https://vercel-vitals.axiom.co/api/v1/send?configurationId=icfg_[OMITTED]&projectId=[OMITTED]&type=logs from getStaticProps (hit by on-demand revalidation call, not during built-time).

image

What could be the cause for this? I am not even sure if it is an issue on our side or maybe on Axioms? 🤔 I can't find anything for using Axiom in getStaticProps.

Any hints or tips?

PS: I have also asked this is Discord.

EDIT This is the high level code if it helps:

export function withIsr<
  // ...
>(handler: IsrGetStaticProps<P, Q, D>, options: WithIsrOptions): GetStaticProps<P, Q, D> {
  return async (context) => {
    // ...
    const logger = new Logger(
      {
        // ...
      },
      undefined,
      false,
      'lambda',
    );

    // ...

    try {
      const result = await handler(extendedContext);
      await logger.flush();
      return result;
    } catch (error) {
      logger.error('Error in ISR getStaticProps', { error });
      await logger.flush();
      throw error;
    }
  };
}

JannikWempe avatar Jun 28 '23 13:06 JannikWempe

I am having the same issue with [email protected] (Techstack: Next.js, tRPC and Vercel).

FetchError: request to https://vercel-vitals.axiom.co/api/v1/send?configurationId=censored&projectId=censored&type=logs failed, reason: write EPROTO 140399046711232:error:1408F10B:SSL routines:ssl3_get_record:wrong version number:../deps/openssl/openssl/ssl/record/ssl3_record.c:331:
    at ClientRequest.<anonymous> (/var/task/node_modules/next/dist/compiled/node-fetch/index.js:1:65756)
    at ClientRequest.emit (node:events:513:28)
    at TLSSocket.socketErrorListener (node:_http_client:494:9)
    at TLSSocket.emit (node:events:513:28)
    at emitErrorNT (node:internal/streams/destroy:157:8)
    at emitErrorCloseNT (node:internal/streams/destroy:122:3)
    at processTicksAndRejections (node:internal/process/task_queues:83:21) {
  type: 'system',
  errno: 'EPROTO',
  code: 'EPROTO'
}

jhb-dev avatar Jul 03 '23 09:07 jhb-dev

I am wondering if its because the runtime is shutdown during http tries to handshake with axiom. can you try to use await log.flush() before your function returns and see if that solves it?

dasfmi avatar Jul 05 '23 15:07 dasfmi

We had the same error when using Node 16. We upgraded to Node 18 and we are now seeing a new error (that is most probably due to Axiom):

TypeError: fetch failed
    at Object.fetch (node:internal/deps/undici/undici:11457:11)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  cause: ConnectTimeoutError: Connect Timeout Error
      at onConnectTimeout (node:internal/deps/undici/undici:8422:28)
      at node:internal/deps/undici/undici:8380:50
      at Immediate._onImmediate (node:internal/deps/undici/undici:8411:13)
      at process.processImmediate (node:internal/timers:476:21)
      at process.topLevelDomainCallback (node:domain:161:15)
      at process.callbackTrampoline (node:internal/async_hooks:128:24) {
    code: 'UND_ERR_CONNECT_TIMEOUT'
  }
}

We are also hosted on Vercel

ValentinH avatar Jul 17 '23 07:07 ValentinH

We had the same error when using Node 16. We upgraded to Node 18 and we are now seeing a new error (that is most probably due to Axiom):

TypeError: fetch failed
    at Object.fetch (node:internal/deps/undici/undici:11457:11)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  cause: ConnectTimeoutError: Connect Timeout Error
      at onConnectTimeout (node:internal/deps/undici/undici:8422:28)
      at node:internal/deps/undici/undici:8380:50
      at Immediate._onImmediate (node:internal/deps/undici/undici:8411:13)
      at process.processImmediate (node:internal/timers:476:21)
      at process.topLevelDomainCallback (node:domain:161:15)
      at process.callbackTrampoline (node:internal/async_hooks:128:24) {
    code: 'UND_ERR_CONNECT_TIMEOUT'
  }
}

We are also hosted on Vercel

Same here. We also just recently updated to Node 18 and now see the same error.

JannikWempe avatar Jul 17 '23 08:07 JannikWempe

can one of you guys provide a minimal reproducible example? is this a one time error or is it consistent?

dasfmi avatar Jul 18 '23 16:07 dasfmi

On our side, it's happening multiple times per day (multiple times per hour sometimes). Our functions still return fine so I'm guessing that the error could happen when Axiom is trying to emit the logs while the function is shutting down.

Regarding the minimum reproductible example, it's pretty hard as the error seems to be random.

ValentinH avatar Jul 19 '23 07:07 ValentinH

I've been experiencing this error for months now (started the day I setup the axiom integration). It seems entirely harmless aside from the part that my logs are riddled with these error messages. And yes....it does seem to be random.

I asked about it in discord back in March: https://discord.com/channels/1065957163161370664/1065957163933114411/1082348198959534170

It would be really great if axiom could find a fix for this.

trevorharwell avatar Aug 09 '23 06:08 trevorharwell

I am seeing this error in my logs too. Reinitializing the integration didn't solve the problem. I have recently upgraded to Next.js 13 (currently v13.4.16) and next-axiom v0.18. My system works, but I am seeing these errors in the production log time to time (5-10 times daily). They are all coming from my API routes. The system is deployed on Vercel.

TypeError: fetch failed at Object.fetch (node:internal/deps/undici/undici:11576:11)

Probably related:

  • https://github.com/axiomhq/next-axiom/issues/128
  • https://github.com/vercel/next.js/issues/44062
  • https://github.com/vercel/next.js/issues/49896

vajdagabor avatar Aug 28 '23 13:08 vajdagabor

Probably also related:

  • https://github.com/nodejs/undici/issues/1248

There are some suggestions in there that the issue might be DNS related. That could explain it's very inconsistent nature.

haydn avatar Aug 31 '23 07:08 haydn

@vajdagabor what is the cause field showing for that error? It'll show the actual issue you're hitting.

ImLunaHey avatar Sep 14 '23 04:09 ImLunaHey

@ImLunaHey this is the full message, including cause (IP address at Vercel redacted with ********):

TypeError: fetch failed
    at Object.fetch (node:internal/deps/undici/undici:11576:11)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  cause: Error: connect ETIMEDOUT ********:443
      at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1495:16)
      at TCPConnectWrap.callbackTrampoline (node:internal/async_hooks:130:17) {
    errno: -110,
    code: 'ETIMEDOUT',
    syscall: 'connect',
    address: '********',
    port: 443
  }
}

vajdagabor avatar Sep 14 '23 04:09 vajdagabor

Possibly related https://github.com/nodejs/undici/issues/1531

Are any of you able to reproduce this on node 20?

ImLunaHey avatar Sep 14 '23 04:09 ImLunaHey

@ImLunaHey node v.18x is the highest that is currently available on Vercel (and this is also the recommended setting, as v16.x and below are deprecated).

vajdagabor avatar Sep 14 '23 05:09 vajdagabor

Also possibly related https://github.com/vercel/next.js/issues/48744 and this may have a "fix" https://medium.com/@kaloyan_17221/fix-vercel-next-js-fetch-failed-from-undici-polyfill-8c66346c9c2f

ImLunaHey avatar Sep 14 '23 11:09 ImLunaHey

I'd like to share an update as we are still experiencing this error and yesterday, I noticed that some logs were missing even though I'm sure that the corresponding functions were executed correctly.

First, we are using Node 20.x on Vercel. Here are the log we can see: image and here's the proof that these logs are related to Axiom: image

This error doesn't seem to only be affecting Axiom as per this issue: https://github.com/vercel/vercel/issues/11692

However, I think that it would be great to have a retry strategy when this error happens. What do you think?

Not being able to trust our logs is a deal breaker. We are looking like amateurs when we need to tell our stakeholders "we don't know what happen" when we are debugging production issues and logs are missing 😭

ValentinH avatar Jul 23 '24 09:07 ValentinH

I also just found these logs: image

ValentinH avatar Jul 23 '24 09:07 ValentinH