nodejs-datastore Total timeout of API google.datastore.v1.Datastore exceeded 60000 milliseconds

Hello, we are getting this error from nodejs-datastore-sdk in our clusters in production from time to time:

Error: Total timeout of API google.datastore.v1.Datastore exceeded 60000 milliseconds before any response was received.

We are having this error for more than one year and no one seems to know what's going on, can you guys at least enlighten us on what could be possibly happening???

Is this a client library issue or a product issue? Yes, it looks like a bug into nodejs-datastore-sdk
Did someone already solve this?

Search the issues already opened: not-found
Search the issues on our "catch-all" repository: not-found
Search or ask on StackOverflow (engineers monitor these tags): not-found

Environment details

OS: debian10
Node.js version: 14.21.1
npm version: 9.5.0
@google-cloud/datastore version: 7.0.0

Steps to reproduce

Unfortunately not reproduceable, it happens completely random times in our VM instances at google-cloud-platform

Oct 18 '23 18:10 jrabelo-colmeia

@danieljbruce, not sure if this is related to b/303109029 and b/303728081

Oct 19 '23 21:10 sofisl

hello guys, here is a callstack that can help you guys to troubleshoot the problem:

Oct 23 '23 13:10 jrabelo-colmeia

Hi guys, the same thing is happening to me, it's very random. It started happening today and we haven't made any changes.

  Error: Total timeout of API google.datastore.v1.Datastore exceeded 60000 milliseconds before any response was received.
  
  at .repeat ( /node_modules/google-gax/build/src/normalCalls/retries.js:66 )
  at .Timeout._onTimeout ( /node_modules/google-gax/build/src/normalCalls/retries.js:101 )
  at .listOnTimeout ( node:internal/timers:569 )
  at process.processTimers ( node:internal/timers:512 )

I have updated my version to see if it solved the problem but it keeps happening.

@google-cloud/datastore 8.2.1 to 8.2.2

Node version: v18.17.0 npm version: 9.6.7

Nov 07 '23 02:11 nlbi21

Hello, any news on any updates in this issue?

Our production servers are simply losing datastore connection and EVERYTHING goes down for a couple of minutes, this is a HUGE problem, if anyone has any idea how to solve this issue we would thanks a lot

Dec 14 '23 18:12 jrabelo-colmeia

I opened a support ticket and was told that, while the issue isn't resolved, there is a workaround. It was confirmed to be an issue with the Datastore backend and not an issue with this library, but it looks like you can now add a fallback to the options object when creating a new Datastore instance.

const ds = = new Datastore({
  fallback: 'rest',
});

Reference PR

Unfortunately, I also need to use the @google-cloud/connect-datastore library and that is currently locked to @google-cloud/datastore v7 so I'm unable to upgrade at this time. I'm currently using our Cloud SQL database as a fallback workaround.

Dec 19 '23 15:12 rossjs

thanks for your help @rossjs we are gonna try this fallback parameter

Dec 19 '23 18:12 looker-colmeia

Hi,

I've tried to add fallback: 'rest' to my Datastore object instanciation and got this error

FetchError: Invalid response body while trying to fetch https://datastore.googleapis.com/v1/projects/[...PROJECT ID...]:lookup?$alt=json%3Benum-encoding=int: read ECONNRESET at Gunzip.<anonymous> ([...PROJECT LOCATION...]/node_modules/google-gax/node_modules/node-fetch/lib/index.js:400:12) at Gunzip.emit (node:events:525:35) at emitErrorNT (node:internal/streams/destroy:151:8) at emitErrorCloseNT (node:internal/streams/destroy:116:3) at process.processTicksAndRejections (node:internal/process/task_queues:82:21) { type: 'system', errno: 'ECONNRESET', code: 'ECONNRESET', note: 'Exception occurred in retry method that was not classified as transient' }

Dec 20 '23 10:12 patriciatrauman

@patriciatrauman are you using datastore 8.3.0 version of library? @danieljbruce is this PR https://github.com/googleapis/nodejs-datastore/pull/1203/files suposed to solve this issue?

Dec 20 '23 12:12 looker-colmeia

@looker-colmeia , here are what the package I use Screenshot 2023-12-20 at 15 34 54 And I tried to implement like this Screenshot 2023-12-20 at 15 36 38 I also tried with value true, false or proto and I did not find any good way :(

Dec 20 '23 14:12 patriciatrauman

@looker-colmeia The PR you mentioned is the workaround as @rossjs mentioned.

@patriciatrauman The code snippet below using 'rest' works just fine for me. Could you provide us with a reproducible code example?

const {Datastore} = require('@google-cloud/datastore');

async function printResults() {
  const datastore = new Datastore({
    fallback: 'rest'
  });
  const kind = "key";
  const taskKey = datastore.key([kind, 1]);
  const newTask = {
    key: taskKey,
    data: {
      value: 999,
    },
  };
  await datastore.save(newTask, {});
  const [entity] = await datastore.get(taskKey);
  const returnedKey = entity[Datastore.KEY];
  console.log(returnedKey);
}

printResults();

Jan 11 '24 22:01 danieljbruce

Closing this issue since I have not heard back, but feel free to open this issue again if it persists.

Feb 13 '24 16:02 danieljbruce

hello, any news on this issue??? rest fallback is very slow

May 29 '24 20:05 looker-colmeia

Lowering priority to P3 since the issue is now just limited to REST fallback.

Jul 17 '24 17:07 danieljbruce

Lowering priority to P3 since the issue is now just limited to REST fallback.

I'd like to remind senior-developers of this lib and also program-managers that THIS IS NOT A P3 PROBLEM

REST fallback is SLOW and we are moving away from Datastore and going to ScyllaDB because of timeout/reset problems

We are also thinking about leaving the entire google-cloud-platform, so I hope you guys realize how bad this problem is for your customers

Jul 17 '24 17:07 looker-colmeia

Lowering priority to P3 since the issue is now just limited to REST fallback.

Could you expand on this message? We are seeing this in production, in two completely different environments.

REST fallback is just not an option for high-load production environments, it's extremely slow - so the issue is not "limited" to REST fallback - I think it should not even be relevant in this discussion to talk about REST fallback 🤔 .

The issue is clearly in the Datastore client - (I am already in contact with GCP support for this and hopefully our requests will reach library team). If you want to reproduce it, I am positive about the fact that it would be enough to:

Spawn a GKE cluster with a pod doing read/writes to datastore
Spam it for a few days with a moderate amount of requests (~50 req/s)
You will eventually see this error

Aug 07 '24 18:08 klaa97

we are also using google-cloud/spanner and google-cloud/big-table and we have no RST problems at all, maybe reading both spanner and big-table codebases would be helpful to solve this problem

Aug 21 '24 18:08 looker-colmeia

Have the same issue using version 9.1.0

Oct 18 '24 09:10 alinalexandru

Also ran into this today internally on one instance, the rest of them are fine. Feels like some form of GRPC handling bug where it loses connection and unable to reconnect and then the queue of things to process just get stuck.

Wish we could reproduce this easier to contribute some form of fix for this, as P3 seems pretty low for something which is pretty severely broken? REST is somewhat OK for our usecase, 15-20 RPS, but seeing 500ms latency now instead of <150ms.

Also the terminology fallback for configuring this seems wrong? It's not a fallback if it uses REST all the time, I'd expect it to prefer GRPC but once it fails, then try REST?

Mar 06 '25 11:03 benjdlambert

We made a change recently to expose the original error that occurs with https://github.com/googleapis/gax-nodejs/pull/1740/files instead of just reporting the [Total timeout of API google.datastore.v1.Datastore exceeded 60000 milliseconds] error. If you upgrade nodejs-datastore to the latest version then you should see Previous Errors in the error message. Please paste the error message you see here.

May 20 '25 17:05 danieljbruce

@danieljbruce awesome, thanks

We migrated 50% of our data to scylladb and the timeouts dissapeared, you guys should make more explicit in the docs what are the real limits of datastore

May 20 '25 17:05 looker-colmeia