nodejs-pubsub icon indicating copy to clipboard operation
nodejs-pubsub copied to clipboard

Error: 4 DEADLINE_EXCEEDED: Deadline exceeded

Open clipboardbolaji opened this issue 2 years ago • 11 comments

Error: 4 DEADLINE_EXCEEDED: Deadline exceeded at Object.callErrorFromStatus (/usr/src/app/node_modules/@grpc/grpc-js/build/src/call.js:31:26) at Object.onReceiveStatus (/usr/src/app/node_modules/@grpc/grpc-js/build/src/client.js:391:49) at Object.onReceiveStatus (/usr/src/app/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:328:181) at /usr/src/app/node_modules/@grpc/grpc-js/build/src/call-stream.js:187:78 at processTicksAndRejections (internal/process/task_queues.js:77:11) at runNextTicks (internal/process/task_queues.js:64:3) at listOnTimeout (internal/timers.js:526:9) at processTimers (internal/timers.js:500:7)

We're experiencing the issue on our production server at intervals.

Environment details

  • OS: Linux
  • Node.js version: 14.19.1
  • npm version:
  • @google-cloud/pubsub version: ^2.18.5

Thanks!

clipboardbolaji avatar Apr 08 '22 10:04 clipboardbolaji

Getting the same error today.

weilinzung avatar Apr 22 '22 12:04 weilinzung

Not 100% sure if it is related, but at least the error code matches. And also the error message seems to be similar to a deadline exceeded error 🤔

The error started to happen recently for the first time and now is triggered here and there. So at least the error code 4. Even though the error message seems to be a bit different:

{
  code: 4,
  GoogleError: Total timeout of API google.pubsub.v1.Publisher exceeded 600000 milliseconds before any response was received.
}
at repeat (/app/node_modules/@google-cloud/pubsub/node_modules/google-gax/build/src/normalCalls/retries.js:66:31)
at Timeout._onTimeout (/app/node_modules/@google-cloud/pubsub/node_modules/google-gax/build/src/normalCalls/retries.js:101:25)
at listOnTimeout (node:internal/timers:559:17)
at processTimers (node:internal/timers:502:7) 

Environment details:

  • Docker base image: node:16-alpine
  • @google-cloud/pubsub: 2.18.3
  • Running on Cloud Run

Might be related to https://github.com/googleapis/nodejs-pubsub/issues/1442 though 🤔

mr-pascal avatar May 23 '22 11:05 mr-pascal

I got this error when i started to use Node 16. grpc-js recommends NodeJs 12. :/

https://www.npmjs.com/package/@grpc/grpc-js

Crypt0Graphic avatar Jun 06 '22 17:06 Crypt0Graphic

Also seeing this error and using node 16.15, however, we also saw this error intermittently with node 14.x. Some other reports about this error point to the issue being newer versions of nodejs-pubsub but even when we downgraded, the issue persisted.

In summary, this has been occurring intermittently for us w/:

  • node 14.x and 16.x
  • @google-cloud/pubsub versions 2.14.x including few intermediate versions up till 3.0.1
  • In Cloud Run (uncommon) and Cloud Functions (more common)

Anyone know of a work-around?

linjer avatar Jun 15 '22 23:06 linjer

@mr-pascal We are getting the same error too with nearly the same environment. Did you find a solution?

Environment details:

  • Docker base image: node:16-alpine
  • @google-cloud/pubsub: 3.0.1
  • Running on Cloud Run

gautier-gdx avatar Jun 23 '22 15:06 gautier-gdx

@gautier-gdx Not sure if it is a solution, but after skimming our company slack channel again it seems to have disappeared after I removed the CPU throttling on the Cloud Run service. So for us, the solution was:

CPU allocation: CPU is always allocated

This is absolutely fine for us since it's actually the preferred setup due to its very high traffic.

At least I haven't experienced any issues anymore since then 🤔 Might be a coincidence or due to this change.

mr-pascal avatar Jun 24 '22 06:06 mr-pascal

@mr-pascal Thanks a lot!

CPU allocation: CPU is always allocated might be not related because we're already running with that configuration and we're still getting error.

gautier-gdx avatar Jun 24 '22 07:06 gautier-gdx

I 'm using a pretty old version of node and google-cloud/pubsub and encounter the same thing Environment details:

  • @google-cloud/pubsub: 1.7.2
  • Node 9.5.0

bachtangthang avatar Jul 15 '22 03:07 bachtangthang

I am using NodeJS v18, deployed in a Cloud run service, and almost every time the instance is shutting down I receive this error. I can't set the CPU allocation as allways allocated because it generates aditional and unnecesaries cost. It would be nice this to be fixed or the error to be handled in a different way

oscarojeda avatar Oct 05 '22 14:10 oscarojeda

@oscarojeda Have you already tried the newest PubSub NPM package version?

Also have you tried awaiting the Promise returned by the publish method?

Mind sharing some code snippet what you exactly doing?

Also do you listen to the sigterm signal triggert by Cloud Run to do some logic in there?

mr-pascal avatar Oct 05 '22 16:10 mr-pascal

Hi @mr-pascal I have the 3.1.0. Did this issue got resolved in the latest version? I am not publishing any message, I am receiving those from a pull subscription. I tryied once to listen to sisterm signal because I wanted to close and delete the subscription, never worked, so I removed. Here is the code.

const messageHandler = (message: Message) => { try { const data = JSON.parse(message.data.toString('utf-8')); if (['created', 'deleted'].includes(data.action)) { this.clients.fetch().then(); } message.ack(); } catch (e) { console.info(e) } } try { const environment = this.config.get<string>('ENVIRONMENT'); const options = this.config.get<CreateSubscriptionOptions>('google-cloud-pubsub.subscriptionOptions'); if (!this.clientSubscription && environment !== 'development') { const topicId = 'client'; this.clientSubscriptionId = 'subscription_'.concat(uuid()); this.pubSub.topic(topicId).createSubscription(this.clientSubscriptionId, options) .then(subscription => { [this.clientSubscription] = subscription; Logger.log(Subscription created: ${this.clientSubscriptionId}, 'PubSub', false); this.clientSubscription.on('message', messageHandler); }); }

do you have any idea?

oscarojeda avatar Oct 07 '22 13:10 oscarojeda

I keep having this issue, despite not using pub/sub but http requests. I thought it was related to cloud-tasks...

You may want to check this, if you use tasks too: stackoverflow issue

peergum avatar Oct 28 '22 12:10 peergum

We have a service that subscribes to PubSub events, that occasionally logs this error:

image

When it logs this error, event processing stops, and we have to restart the pods to get it to start again. There are no other relevant details in any logs we can find, just this error, then processing stops.

yasha-spare avatar Jan 18 '23 16:01 yasha-spare

any update on this? We too are getting the same error on our production env, Using '@nestjs/microservices' EventPattern for pull subscriptions.

jagrajsingh-betaflux avatar Jan 30 '23 09:01 jagrajsingh-betaflux

We are also seeing this error on a NestJS project with @algoan/nestjs-google-pubsub-microservice and @nestjs/microservices

anthony-langford avatar Jan 30 '23 21:01 anthony-langford

@anthony-langford what's the ack deadline time for your subscriptions ?

jagrajsingh-betaflux avatar Jan 31 '23 04:01 jagrajsingh-betaflux

@anthony-langford what's the ack deadline time for your subscriptions ?

1 minute on one subscription and 5 minutes on the other

anthony-langford avatar Jan 31 '23 13:01 anthony-langford

Same here , grpc.client full of this errors

Error: 4 DEADLINE_EXCEEDED: Deadline exceeded
    at Object.callErrorFromStatus (/usr/src/app/node_modules/@grpc/grpc-js/build/src/call.js:31:26)
    at Object.onReceiveStatus (/usr/src/app/node_modules/@grpc/grpc-js/build/src/client.js:409:49)
    at Object.onReceiveStatus (/usr/src/app/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:328:181)
    at /usr/src/app/node_modules/@grpc/grpc-js/build/src/call-stream.js:187:78
    at processTicksAndRejections (internal/process/task_queues.js:77:11)

Docker: ubuntu20.04 node version : 14.21.1

"@google-cloud/pubsub": {
     "version": "2.19.4",

Anyone found a solution for this issue ?

markrity avatar Feb 13 '23 14:02 markrity

Started getting the same error trying to assert 20+ topics and subscriptions from code - I'm checking topic/subscription existence with exists() method. It worked perfectly with 10-15 topics, but now it crashes with this DEADLINE_EXCEEDED error:( Are there any rate limits or stuff like that for exists API calls?

rovnyart avatar Feb 20 '23 12:02 rovnyart

Hi, I'm also getting this error.

Environment details:

  • Cloud Run
  • Docker base image: node:18-alpine
  • @google-cloud/pubsub: ^3.2.1

This happens intermittently when making unary pulls from a subscription.

manurosa avatar Apr 12 '23 18:04 manurosa

Hi, I'm also getting this error.

Environment details:

Cloud Run Docker base image: node:14-alpine @google-cloud/pubsub: ^3.4.1 This happens intermittently when making pulls from a subscription.

Please let us know if there is any update on this?

vidhyashekar avatar Apr 20 '23 05:04 vidhyashekar

Hi @clipboardbolaji , are you still facing this issue ? If so could you provide steps to reproduce this issue.

galz10 avatar May 15 '23 17:05 galz10

Same problem here. Nodejs v18.14.2 "@google-cloud/pubsub": "^3.6.0"

const [subExists] = await subscription.exists()

Error: 4 DEADLINE_EXCEEDED: Deadline exceeded at callErrorFromStatus (/Users/jyk/Development/repo/chatApp/websocket/node_modules/@grpc/grpc-js/src/call.ts:81:17) at Object.onReceiveStatus (/Users/jyk/Development/repo/chatApp/websocket/node_modules/@grpc/grpc-js/src/client.ts:356:55) at Object.onReceiveStatus (/Users/jyk/Development/repo/chatApp/websocket/node_modules/@grpc/grpc-js/src/client-interceptors.ts:455:34) at Object.onReceiveStatus (/Users/jyk/Development/repo/chatApp/websocket/node_modules/@grpc/grpc-js/src/client-interceptors.ts:417:48) at /Users/jyk/Development/repo/chatApp/websocket/node_modules/@grpc/grpc-js/src/resolving-call.ts:111:24 at processTicksAndRejections (node:internal/process/task_queues:77:11) for call at at ServiceClientImpl.makeUnaryRequest (/Users/jyk/Development/repo/chatApp/websocket/node_modules/@grpc/grpc-js/src/client.ts:326:30) at ServiceClientImpl. (/Users/jyk/Development/repo/chatApp/websocket/node_modules/@grpc/grpc-js/src/make-client.ts:189:15) at /Users/jyk/Development/repo/chatApp/websocket/node_modules/@google-cloud/pubsub/src/v1/subscriber_client.ts:316:25 at /Users/jyk/Development/repo/chatApp/websocket/node_modules/google-gax/src/normalCalls/timeout.ts:54:13 at repeat (/Users/jyk/Development/repo/chatApp/websocket/node_modules/google-gax/src/normalCalls/retries.ts:104:19) at /Users/jyk/Development/repo/chatApp/websocket/node_modules/google-gax/src/normalCalls/retries.ts:144:7 at OngoingCall.call (/Users/jyk/Development/repo/chatApp/websocket/node_modules/google-gax/src/call.ts:81:23) at NormalApiCaller.call (/Users/jyk/Development/repo/chatApp/websocket/node_modules/google-gax/src/normalCalls/normalApiCaller.ts:43:15) at /Users/jyk/Development/repo/chatApp/websocket/node_modules/google-gax/src/createApiCall.ts:118:26 { code: 4, details: 'Deadline exceeded', metadata: Metadata { internalRepr: Map(0) {}, options: {} }, note: 'Exception occurred in retry method that was not classified as transient' }

JYeop avatar May 17 '23 06:05 JYeop

At least regarding unary pull, deadline exceeded errors are going to happen from time to time and probably be even more likely when there are no messages to pull. If there are no messages, it means the request is going to wait the longest to get back a response.

Could you try making DEADLINE_EXCEEDED a retryable error for Pull? Here's a similar example for specifying custom retries for Publish

hongalex avatar May 20 '23 01:05 hongalex

@rovnyart

Started getting the same error trying to assert 20+ topics and subscriptions from code - I'm checking topic/subscription existence with exists() method. It worked perfectly with 10-15 topics, but now it crashes with this DEADLINE_EXCEEDED error:( Are there any rate limits or stuff like that for exists API calls?

There are, actually. The admin plane calls are throttled, though I don't remember the exact limit. The way we've been recommending to deal with this, if you need to do it regularly, is to try to just open the topic or subscription for use and see if it gives you an error. Which is honestly sort of clunky and can result in lost messages if you were queueing them up on a subscription. So I might bring this up again Thursday.

feywind avatar May 29 '23 20:05 feywind

Separately: this error is really too generic to be able to do much debugging on it. The best way forward, if anyone is still having it regularly and has a support contract, is to put in a support case so we can ask the service engineers to look deeper for what's going on in your specific case.

feywind avatar May 29 '23 20:05 feywind

@rovnyart

Okay, the answer in regards to checking if a subscription exists is that you should try to open a subscriber first (e.g. pubsub.subscription('xxx')) and see if it fails (using .on('error')). If it does, then do the createSubscription(). This is admittedly a bit clunky, but messages won't be lost (I was wrong above) because they won't get acked, and will get redelivered.

There is a quota for admin operations; the "read" operations like get have a bigger quota, but it's still not infinite. So that's not really a long term tenable solution unless you are doing it very infrequently.

This should get less clunky in v2, because we're planning to separate the admin and data planes. So create will be a separate operation from opening a subscriber.

I'm going to close this, but reminder to anyone else, if you are having the deadline exceeded errors and have a support contract, please file a support case so we can look at server logs. Thanks!

feywind avatar Jun 02 '23 18:06 feywind

Thanks for the discussion above!

@feywind we have a support contract and have opened a ticket for further investigation, it's #45302224 if you want to follow along. The commonalities we see with others on this thread is that we create a client for each subscription and listen to about a dozen subscriptions per cloud run instance. Across our 20 instances, we get about 30 errors that repeat on a regular pattern every 15 minutes; these are captured via datadog-agent. So it's possible this is intentional error throwing and handling based on the message-stream.js code.

comp615 avatar Jun 05 '23 17:06 comp615

Hi, we're still experiencing this issue. Will it help if we open another support ticket? Is there a preferred workaround we could document somewhere perhaps?

asfaltboy avatar Aug 01 '23 08:08 asfaltboy

If anyone hasn't tried turning on grpc keepalive support here, please do try that to see if it helps anything. e.g.:

For JavaScript:

const pubsubClient = new PubSub({ 'grpc.keepalive_timeout_ms': 10000, 'grpc.keepalive_time_ms': 30000 });

Or for TypeScript:

const pubsubClient = new PubSub({ 'grpc.keepalive_timeout_ms': 10000, 'grpc.keepalive_time_ms': 30000 } as any);

Might also want to force-upgrade grpc-js in your project:

npm i --save @grpc/[email protected]

These are all just workarounds, but some people are having success. We're still trying to get a root cause. You'll probably want to remove that grpc-js package dependency later so you get updates again, and we're talking about just making keepalives be an always-on thing.

feywind avatar Aug 04 '23 17:08 feywind