gax-nodejs
gax-nodejs copied to clipboard
Stopping retryable function if API call errored
Recently I've been stuck with using @google-cloud/pubsub package because it was returning an error of
Total timeout of API google.pubsub.v1.Publisher exceeded 60000 milliseconds before any response was received.
I've browsed a couple of issues and StackOverflow answers already, and it seemed pretty vague and irrelevant to my situation, but thanks to the stack trace, It could give me the entry point to start debugging my issue which is located here at the retryable function, when I logged the error I could actually get this output
Error: 14 UNAVAILABLE: Name resolution failed for target DNS:......
After that, I understood that I'd set the service path wrong and could solve my issue.
The problem that I could notice is with the retryable function, we don't ever catch the error at our callback to make use of and return an error if there is any, and since it's not used, we keep failing and just fallback to the maximum number of retries exceeded.
I was able to get the right error directly by setting this codeblock right before setting the canceller to null and keep repeating the request.
and now I could get the error that way Error: Error: 14 UNAVAILABLE: Name resolution failed for target DNS:....
The impact now is to return back the right error for the end-users so they can investigate their own issues correctly and stop repeating requests in vain. If that makes sense, I can open a PR with that.
Thanks!
@secmohammed thank you for the investigation. I'm assigning this to @alexander-fenster, who's the subject matter expert in gax.
I think what might be happening is the retry logic attempts retrying longer than the maximum retry of the RPC you're calling? Do you get the DNS error, as expected, if you extend the timeout on the RPC?
Hello @bcoe ! I had a confusion with the servicePath option and thought it meant the service account file path which doesn't matter how many retries we gonna consume at this point. The problem isn't about extending retries but exposing the right error so we can debug better because it never picks up the error returned from the response and keep trying in vain then return us back the timeout/retries exceeded error