sentry-cli icon indicating copy to clipboard operation
sentry-cli copied to clipboard

Retry API request when DNS resolution of sentry.io fails

Open yangskyboxlabs opened this issue 3 months ago • 5 comments

Problem Statement

This is a follow-up to #2177: API request failed caused by: [6] Couldn't resolve host name (Could not resolve host: sentry.io)

We are seeing this when uploading symbol files from Linux hosts, with a failure rate of <10% (4 of the previous 50 invocations, at time of writing).

I suspect it is caused by unreliable .io TLD DNS servers that is somehow amplified by some combination of a DNS stack on Linux.

If I'm grokking the retry logic correctly, it only attempts retries on certain HTTP status codes, and does not cover the cases from other parts of the stack:

const RETRY_STATUS_CODES: &[u32] = &[
    http::HTTP_STATUS_502_BAD_GATEWAY,
    http::HTTP_STATUS_503_SERVICE_UNAVAILABLE,
    http::HTTP_STATUS_504_GATEWAY_TIMEOUT,
    http::HTTP_STATUS_507_INSUFFICIENT_STORAGE,
    http::HTTP_STATUS_524_CLOUDFLARE_TIMEOUT,
];

// ...

    pub fn send(mut self) -> ApiResult<ApiResponse> {
        // -- snip --
        loop {
            let mut out = vec![];
            debug!("retry number {retry_number}, max retries: {max_retries}",);

            let mut rv = self.send_into(&mut out)?;
            if retry_number >= max_retries || !RETRY_STATUS_CODES.contains(&rv.status) {
                rv.body = Some(out);
                return Ok(rv);
            }

            // -- snip --
        }
    }

Implementing retry to DNS resolution failure should alleviate this issue.

Solution Brainstorm

No response

yangskyboxlabs avatar Sep 16 '25 17:09 yangskyboxlabs

@yangskyboxlabs, this sounds like a good idea, I will place the issue on our backlog.

Implementation note

Seems like we can use this function to check whether the error is a DNS resolution error. We would need to downcast the APIError (via the source field) to the curl Error type.

szokeasaurusrex avatar Sep 17 '25 15:09 szokeasaurusrex

Just thought I'd add that we're also seeing this for around 5 - 10% of builds when artifacts are uploaded using @sentry/vite-plugin. The servers that experience this otherwise have no issues with DNS resolution, and are using Cloudflare's 1.1.1.1 DNS servers, so they're proobably pretty reliable.

metatick avatar Oct 22 '25 16:10 metatick

We have the same issue, our builds are sometimes randomly failing due to DNS resolution issues...

GeorchW avatar Nov 18 '25 15:11 GeorchW

Thanks for letting us know, I am increasing this issue's priority on our internal backlog

szokeasaurusrex avatar Nov 19 '25 10:11 szokeasaurusrex