grpc-node icon indicating copy to clipboard operation
grpc-node copied to clipboard

Adding more context to error stack traces

Open pratik151192 opened this issue 1 year ago • 0 comments

Problem description

Hey,

We've been investigating DEADLINE_EXCEEDED errors that our clients are facing at Momento. Our clients periodically receive this and we have done a few things that has helped reduce the volume of the deadline errors drastically. Most of these improvements had to do around keepalive settings and/or tweaking other connection settings. We've been mostly enabling gRPC traces with subchannel, dns_resolver, resolving_load_balancer, keepalive traces which have been very helpful.

Our clients still see a burst of these errors, lasting for up-to a minute, and then resolving itself. We have been gathering some event loop/CPU metrics from the clients to correlate any resource contention on the nodeJS process. Meanwhile, we are curious if there are other ways to debug the errors apart from the methods I mentioned above. Now we are at a point where we do not see those gRPC trace logs around the requests indicating that no reconnections took place. Since these logs are verbose, we do not want clients to enable all for the traces as it impacts their monitoring bills.

This ticket is primarily for stack traces (similar to this) that I believe can help us figure out at what point the deadline happened. Going via a ticket on this Java SDK, it seems like we can answer a few things based on the stack trace. Currently the stack traces stop at the onReceiveStatus method of the underlying gRPC library. I'd expect some stack traces from the request lifecycle when the call actually failed.

We'd love to hear any other ideas you may have for us to debug client-side timeouts/deadline exceeded errors better!

Reproduction steps

  • Set a very low timeout value on any gRPC backed service and see the stack trace always looking similar (I'd expect more traces from grpc-js):
Error: 4 DEADLINE_EXCEEDED: Deadline exceeded
        at callErrorFromStatus (/Users/pratik/sandbox/js2/client-sdk-javascript/packages/client-sdk-nodejs/node_modules/@grpc/grpc-js/src/call.ts:82:17)
        at Object.onReceiveStatus (/Users/pratik/sandbox/js2/client-sdk-javascript/packages/client-sdk-nodejs/node_modules/@grpc/grpc-js/src/client.ts:360:55)
        at /Users/pratik/sandbox/js2/client-sdk-javascript/packages/client-sdk-nodejs/node_modules/@grpc/grpc-js/src/call-interface.ts:149:27
        at /Users/pratik/sandbox/js2/client-sdk-javascript/packages/client-sdk-nodejs/src/internal/grpc/middlewares-interceptor.ts:105:40
        at /Users/pratik/sandbox/js2/client-sdk-javascript/packages/client-sdk-nodejs/src/internal/grpc/middlewares-interceptor.ts:158:19
        at processTicksAndRejections (node:internal/process/task_queues:95:5)
    for call at
        at ScsClient.makeUnaryRequest (/Users/pratik/sandbox/js2/client-sdk-javascript/packages/client-sdk-nodejs/node_modules/@grpc/grpc-js/src/client.ts:325:42)
        at ScsClient.Set (/Users/pratik/sandbox/js2/client-sdk-javascript/packages/client-sdk-nodejs/node_modules/@grpc/grpc-js/src/make-client.ts:189:15)
        at ScsClient.Set (/Users/pratik/sandbox/js2/client-sdk-javascript/packages/client-sdk-nodejs/node_modules/@gomomento/generated-types/dist/cacheclient.js:12394:30)
        at /Users/pratik/sandbox/js2/client-sdk-javascript/packages/client-sdk-nodejs/src/internal/cache-data-client.ts:354:38
        at new Promise (<anonymous>)
        at CacheDataClient.sendSet (/Users/pratik/sandbox/js2/client-sdk-javascript/packages/client-sdk-nodejs/src/internal/cache-data-client.ts:353:18)
        at CacheDataClient.set (/Users/pratik/sandbox/js2/client-sdk-javascript/packages/client-sdk-nodejs/src/internal/cache-data-client.ts:338:23)
        at CacheClient.set (/Users/pratik/sandbox/js2/client-sdk-javascript/packages/core/src/internal/clients/cache/AbstractCacheClient.ts:200:25)
        at Object.<anonymous> (/Users/pratik/sandbox/js2/client-sdk-javascript/packages/client-sdk-nodejs/test/integration/cache-client-close.test.ts:18:20)

Environment

  • OS name, version and architecture: [e.g. Linux Ubuntu 18.04 amd64]: Linux/MacOS
  • Node version [e.g. 8.10.0]: 20
  • Package name and version [e.g. [email protected]]: [email protected]

pratik151192 avatar Mar 13 '24 13:03 pratik151192