[core-http] or [search-documents] Retry on ENOTFOUND
Occasionally our Azure Function app, which uses search-documents does not seem to be able to resolve the search endpoint:
getaddrinfo ENOTFOUND ***.search.windows.net","errStack":"RestError: request to https://***.search.windows.net/indexes(%27****%27)/docs/search.post.search?api-version=2020-06-30 failed,
reason: getaddrinfo ENOTFOUND ****.search.windows.net
at new RestError (C:\\home\\site\\wwwroot\\node_modules\\@azure\\core-http\\dist\\index.js:2390:28)\n at NodeFetchHttpClient.<anonymous> (C:\\home\\site\\wwwroot\\node_modules\\@az
search-documents already has a retry function in place, but it only retries based on certain response codes.
It would be nice if it would also retry in case of DNS ENOTFOUND errors.
@restfulhead Thanks for reporting this issue. I have looked into the issue. As you rightly pointed out, the search-documents SDK has a retry logic which is invoked for 3 error codes:
- 422
- 409
- 503
Now, if there is a strong justification to add a new code for DNS ENOTFOUND (BTW what status code do you want to add?) we could do it. But, in this case, if the DNS entry is not found, it does not look like a transient issue that might get resolved on retry. May be this is an actual service issue that should handled by the search-service team. Please share your thoughts.
[...] if the DNS entry is not found, it does not look like a transient issue that might get resolved on retry.
For us it's definitely a transient issue, because it happens once every day or two and usually only for a single request. Subsequent requests work again. During this time there have been no changes on our end. The same app does many other web service requests, but the ENOTFOUND error only happens with Azure Search. Therefore I think a retry would help us here.
issue that should handled by the search-service team.
Yeah, I was thinking to also open a support ticket with them. We have 3 nodes, maybe this happens when they switch nodes or upgrade or something. However, in the past support came back with "try implementing retry" (not related to search, it was something else). So before going down that road, I was wondering if a retry could be implemented here.
BTW what status code do you want to add?
Not a status code, because there is no response. The request fails, because of the DNS issue. What would be needed is to catch the RestError, check for ENOTFOUND in the message and then retry.
Thanks @restfulhead for the quick response. With your explanation, I also believe adding such a code would definitely be helpful in your case. Let me start working on this and update you. Thanks
We are having the exact same issue
Also seeing this show up as well -- lots of requests from Azure functions showing up as failing
I think the current proposal is to add ENOTFOUND to this list: https://github.com/Azure/azure-sdk-for-js/blob/6e661d9fb78647801de66529d50c4c4d3660ff84/sdk/core/core-rest-pipeline/src/retryStrategies/exponentialRetryStrategy.ts#L103
Seems straightforward, the only thing I'm not 100% clear on is if this will impact scenarios like managed identity where we might want to fail fast. Though I think in the managed identity case we don't rely on DNS and the URLs are always IP-based. Is that correct @KarishmaGhiya ?
Thank you for raising this. The issue has been resolved in the latest release of @azure/[email protected]
Thanks -- for anyone curious, we were eventually able to isolate the root cause for many of our transient DNS errors:
Our function app was in a VNET that didn't have enough IPs assigned to handle the # of instances we were trying to run.