nodejs-datastore icon indicating copy to clipboard operation
nodejs-datastore copied to clipboard

Total timeout of API google.datastore.v1.Datastore exceeded 60000 milliseconds

Open jrabelo-colmeia opened this issue 2 years ago • 20 comments

Hello, we are getting this error from nodejs-datastore-sdk in our clusters in production from time to time:

Error: Total timeout of API google.datastore.v1.Datastore exceeded 60000 milliseconds before any response was received.

We are having this error for more than one year and no one seems to know what's going on, can you guys at least enlighten us on what could be possibly happening???

  1. Is this a client library issue or a product issue? Yes, it looks like a bug into nodejs-datastore-sdk

  2. Did someone already solve this?

  • Search the issues already opened: not-found
  • Search the issues on our "catch-all" repository: not-found
  • Search or ask on StackOverflow (engineers monitor these tags): not-found

Environment details

  • OS: debian10
  • Node.js version: 14.21.1
  • npm version: 9.5.0
  • @google-cloud/datastore version: 7.0.0

Steps to reproduce

  1. Unfortunately not reproduceable, it happens completely random times in our VM instances at google-cloud-platform

jrabelo-colmeia avatar Oct 18 '23 18:10 jrabelo-colmeia

@danieljbruce, not sure if this is related to b/303109029 and b/303728081

sofisl avatar Oct 19 '23 21:10 sofisl

hello guys, here is a callstack that can help you guys to troubleshoot the problem: image

jrabelo-colmeia avatar Oct 23 '23 13:10 jrabelo-colmeia

Hi guys, the same thing is happening to me, it's very random. It started happening today and we haven't made any changes.

  Error: Total timeout of API google.datastore.v1.Datastore exceeded 60000 milliseconds before any response was received.
  
  at .repeat ( /node_modules/google-gax/build/src/normalCalls/retries.js:66 )
  at .Timeout._onTimeout ( /node_modules/google-gax/build/src/normalCalls/retries.js:101 )
  at .listOnTimeout ( node:internal/timers:569 )
  at process.processTimers ( node:internal/timers:512 )

I have updated my version to see if it solved the problem but it keeps happening.

@google-cloud/datastore 8.2.1 to 8.2.2

Node version: v18.17.0 npm version: 9.6.7

nlbi21 avatar Nov 07 '23 02:11 nlbi21

Hello, any news on any updates in this issue?

Our production servers are simply losing datastore connection and EVERYTHING goes down for a couple of minutes, this is a HUGE problem, if anyone has any idea how to solve this issue we would thanks a lot

jrabelo-colmeia avatar Dec 14 '23 18:12 jrabelo-colmeia

I opened a support ticket and was told that, while the issue isn't resolved, there is a workaround. It was confirmed to be an issue with the Datastore backend and not an issue with this library, but it looks like you can now add a fallback to the options object when creating a new Datastore instance.

const ds = = new Datastore({
  fallback: 'rest',
});

Reference PR

Unfortunately, I also need to use the @google-cloud/connect-datastore library and that is currently locked to @google-cloud/datastore v7 so I'm unable to upgrade at this time. I'm currently using our Cloud SQL database as a fallback workaround.

rossjs avatar Dec 19 '23 15:12 rossjs

thanks for your help @rossjs we are gonna try this fallback parameter

looker-colmeia avatar Dec 19 '23 18:12 looker-colmeia

Hi,

I've tried to add fallback: 'rest' to my Datastore object instanciation and got this error

FetchError: Invalid response body while trying to fetch https://datastore.googleapis.com/v1/projects/[...PROJECT ID...]:lookup?$alt=json%3Benum-encoding=int: read ECONNRESET at Gunzip.<anonymous> ([...PROJECT LOCATION...]/node_modules/google-gax/node_modules/node-fetch/lib/index.js:400:12) at Gunzip.emit (node:events:525:35) at emitErrorNT (node:internal/streams/destroy:151:8) at emitErrorCloseNT (node:internal/streams/destroy:116:3) at process.processTicksAndRejections (node:internal/process/task_queues:82:21) { type: 'system', errno: 'ECONNRESET', code: 'ECONNRESET', note: 'Exception occurred in retry method that was not classified as transient' }

patriciatrauman avatar Dec 20 '23 10:12 patriciatrauman

@patriciatrauman are you using datastore 8.3.0 version of library? @danieljbruce is this PR https://github.com/googleapis/nodejs-datastore/pull/1203/files suposed to solve this issue?

looker-colmeia avatar Dec 20 '23 12:12 looker-colmeia

@looker-colmeia , here are what the package I use Screenshot 2023-12-20 at 15 34 54 And I tried to implement like this Screenshot 2023-12-20 at 15 36 38 I also tried with value true, false or proto and I did not find any good way :(

patriciatrauman avatar Dec 20 '23 14:12 patriciatrauman

@looker-colmeia The PR you mentioned is the workaround as @rossjs mentioned.

@patriciatrauman The code snippet below using 'rest' works just fine for me. Could you provide us with a reproducible code example?

const {Datastore} = require('@google-cloud/datastore');

async function printResults() {
  const datastore = new Datastore({
    fallback: 'rest'
  });
  const kind = "key";
  const taskKey = datastore.key([kind, 1]);
  const newTask = {
    key: taskKey,
    data: {
      value: 999,
    },
  };
  await datastore.save(newTask, {});
  const [entity] = await datastore.get(taskKey);
  const returnedKey = entity[Datastore.KEY];
  console.log(returnedKey);
}

printResults();

danieljbruce avatar Jan 11 '24 22:01 danieljbruce

Closing this issue since I have not heard back, but feel free to open this issue again if it persists.

danieljbruce avatar Feb 13 '24 16:02 danieljbruce

hello, any news on this issue??? rest fallback is very slow

looker-colmeia avatar May 29 '24 20:05 looker-colmeia

Lowering priority to P3 since the issue is now just limited to REST fallback.

danieljbruce avatar Jul 17 '24 17:07 danieljbruce

Lowering priority to P3 since the issue is now just limited to REST fallback.

I'd like to remind senior-developers of this lib and also program-managers that THIS IS NOT A P3 PROBLEM

REST fallback is SLOW and we are moving away from Datastore and going to ScyllaDB because of timeout/reset problems

We are also thinking about leaving the entire google-cloud-platform, so I hope you guys realize how bad this problem is for your customers

looker-colmeia avatar Jul 17 '24 17:07 looker-colmeia

Lowering priority to P3 since the issue is now just limited to REST fallback.

Could you expand on this message? We are seeing this in production, in two completely different environments.

REST fallback is just not an option for high-load production environments, it's extremely slow - so the issue is not "limited" to REST fallback - I think it should not even be relevant in this discussion to talk about REST fallback 🤔 .

The issue is clearly in the Datastore client - (I am already in contact with GCP support for this and hopefully our requests will reach library team). If you want to reproduce it, I am positive about the fact that it would be enough to:

  • Spawn a GKE cluster with a pod doing read/writes to datastore
  • Spam it for a few days with a moderate amount of requests (~50 req/s)
  • You will eventually see this error

klaa97 avatar Aug 07 '24 18:08 klaa97

we are also using google-cloud/spanner and google-cloud/big-table and we have no RST problems at all, maybe reading both spanner and big-table codebases would be helpful to solve this problem

looker-colmeia avatar Aug 21 '24 18:08 looker-colmeia

Have the same issue using version 9.1.0

alinalexandru avatar Oct 18 '24 09:10 alinalexandru

Also ran into this today internally on one instance, the rest of them are fine. Feels like some form of GRPC handling bug where it loses connection and unable to reconnect and then the queue of things to process just get stuck.

Wish we could reproduce this easier to contribute some form of fix for this, as P3 seems pretty low for something which is pretty severely broken? REST is somewhat OK for our usecase, 15-20 RPS, but seeing 500ms latency now instead of <150ms.

Also the terminology fallback for configuring this seems wrong? It's not a fallback if it uses REST all the time, I'd expect it to prefer GRPC but once it fails, then try REST?

benjdlambert avatar Mar 06 '25 11:03 benjdlambert

We made a change recently to expose the original error that occurs with https://github.com/googleapis/gax-nodejs/pull/1740/files instead of just reporting the [Total timeout of API google.datastore.v1.Datastore exceeded 60000 milliseconds] error. If you upgrade nodejs-datastore to the latest version then you should see Previous Errors in the error message. Please paste the error message you see here.

danieljbruce avatar May 20 '25 17:05 danieljbruce

@danieljbruce awesome, thanks

We migrated 50% of our data to scylladb and the timeouts dissapeared, you guys should make more explicit in the docs what are the real limits of datastore

looker-colmeia avatar May 20 '25 17:05 looker-colmeia