gax-nodejs icon indicating copy to clipboard operation
gax-nodejs copied to clipboard

Stopping retryable function if API call errored

Open secmohammed opened this issue 2 years ago • 2 comments

Recently I've been stuck with using @google-cloud/pubsub package because it was returning an error of

Total timeout of API google.pubsub.v1.Publisher exceeded 60000 milliseconds before any response was received.

I've browsed a couple of issues and StackOverflow answers already, and it seemed pretty vague and irrelevant to my situation, but thanks to the stack trace, It could give me the entry point to start debugging my issue which is located here at the retryable function, when I logged the error I could actually get this output

Error: 14 UNAVAILABLE: Name resolution failed for target DNS:......

After that, I understood that I'd set the service path wrong and could solve my issue.

The problem that I could notice is with the retryable function, we don't ever catch the error at our callback to make use of and return an error if there is any, and since it's not used, we keep failing and just fallback to the maximum number of retries exceeded.

I was able to get the right error directly by setting this codeblock right before setting the canceller to null and keep repeating the request.

and now I could get the error that way Error: Error: 14 UNAVAILABLE: Name resolution failed for target DNS:....

The impact now is to return back the right error for the end-users so they can investigate their own issues correctly and stop repeating requests in vain. If that makes sense, I can open a PR with that.

Thanks!

secmohammed avatar Jan 11 '23 16:01 secmohammed

@secmohammed thank you for the investigation. I'm assigning this to @alexander-fenster, who's the subject matter expert in gax.

I think what might be happening is the retry logic attempts retrying longer than the maximum retry of the RPC you're calling? Do you get the DNS error, as expected, if you extend the timeout on the RPC?

bcoe avatar Jan 12 '23 15:01 bcoe

Hello @bcoe ! I had a confusion with the servicePath option and thought it meant the service account file path which doesn't matter how many retries we gonna consume at this point. The problem isn't about extending retries but exposing the right error so we can debug better because it never picks up the error returned from the response and keep trying in vain then return us back the timeout/retries exceeded error

secmohammed avatar Jan 12 '23 16:01 secmohammed