dd-sdk-android icon indicating copy to clipboard operation
dd-sdk-android copied to clipboard

Repeated network error due to "browser-intake-datadoghq.com"

Open lzanita09 opened this issue 1 year ago • 7 comments

Describe the bug

We have been seeing a ton of repeated errors like these from our release variant:

20:24:17.556  W  [_dd.sdk_core.default]: Batch c4e38407-3402-401c-83e2-171154a9180f [751 bytes] (Traces Request) failed because of a network error; we will retry later.
20:24:17.556  E  [_dd.sdk_core.default]: Unable to execute the request; we will retry later.
20:24:17.556  E  
20:24:17.556  W  [_dd.sdk_core.default]: Batch 4cc740c2-0824-4fff-9084-1c6d2f20c865 [1748 bytes] (RUM Request) failed because of a network error; we will retry later.
20:24:17.559  E  [_dd.sdk_core.default]: Unable to execute the request; we will retry later.
20:24:17.559  E  
20:24:17.559  W  [_dd.sdk_core.default]: Batch 02dd0b1f-5d59-4f0f-a115-04f14d3e1d54 [751 bytes] (Traces Request) failed because of a network error; we will retry later.
20:24:17.559  E  [_dd.sdk_core.default]: Unable to execute the request; we will retry later.
20:24:17.559  E  
20:24:17.559  W  [_dd.sdk_core.default]: Batch 251b08fa-cd95-4aac-9a2b-65114a3ea1ea [7033 bytes] (RUM Request) failed because of a network error; we will retry later.
20:24:17.561  E  [_dd.sdk_core.default]: Unable to execute the request; we will retry later.
20:24:17.561  E  
20:24:17.561  W  [_dd.sdk_core.default]: Batch af1811b7-b41d-4689-88d1-2838fd607dda [751 bytes] (Traces Request) failed because of a network error; we will retry later.
20:24:17.561  E  [_dd.sdk_core.default]: Unable to execute the request; we will retry later.

After putting a breakpoint on DataOkHttpUploader#upload method, we were able to see the exception from the try/catch block around executeUploadRequest:

Unable to resolve host "browser-intake-datadoghq.com": No address associated with hostname

Because this seems to happen repeatedly, it caused the app to drain the battery very quickly.

Reproduction steps

Unsure how to reproduce this issue consistently.

Logcat logs

No response

Expected behavior

No response

Affected SDK versions

2.11.0

Latest working SDK version

Unsure, tried 2.7.0, 2.8.0, 2.9.0, and 2.11.0, all had the same issue

Did you confirm if the latest SDK version fixes the bug?

Yes

Kotlin / Java version

Kotlin 2.0.0

Gradle / AGP version

Gradle 8.7, AGP 8.5.1

Other dependencies versions

No response

Device Information

No response

Other relevant information

No response

lzanita09 avatar Jul 23 '24 03:07 lzanita09

Hi @lzanita09 , thanks a lot for reaching out to us.

It looks like this is a DNS issue happening on the host device. Do you have more information on the devices where this issue happens (brand, model, OS version), and the network it was using at the time the issue happened ?

In the meantime, we'll try and make a fix to prevent draining the battery when this situation occurs

xgouchet avatar Jul 23 '24 08:07 xgouchet

Yep, it happened on my device too so I have plenty of information about the device: Pixel 6, running on Android 14. When the issue happened the device was connected Wifi.

It does seem like this is DNS-related, if I tried opening https://browser-intake-datadoghq.com/ in browser on my device with cellular data, I got the error

{"errors":[{"status":"404","title":"Not Found","detail":"HTTP path is invalid"}]}

But after connecting with wifi, the same URL gives me an error from the browser, saying that the site cannot be reached.

lzanita09 avatar Jul 23 '24 13:07 lzanita09

Is this a company, personal or public wifi ? There might be some configuration blocking our URLs, do you have access to the Wifi router logs ?

xgouchet avatar Jul 23 '24 14:07 xgouchet

Personal wifi, but unfortunately I don't think I can get the router logs.

lzanita09 avatar Jul 23 '24 15:07 lzanita09

I do have some more info about the DNS, though. After switching from the ISP default DNS to 1.1.1.1 I was able to get that 404 error instead of unreachable from my device, which I assume would also fix the SDK's issue.

Happy to provide more DNS info to you if needed, over email instead of here, though.

lzanita09 avatar Jul 23 '24 15:07 lzanita09

For the DNS details yes it'd be very helpful, can you open a support ticket, mentioning this ticket please ?

xgouchet avatar Jul 24 '24 10:07 xgouchet

For the DNS details yes it'd be very helpful, can you open a support ticket, mentioning this ticket please ?

Just did, the support ticket number is #1782669.

lzanita09 avatar Jul 24 '24 12:07 lzanita09

Hello @lzanita09! It seems that support ticket is resolved, so I'm closing this issue.

0xnm avatar Sep 17 '24 08:09 0xnm

TIL if you use pi-hole as an ad blocker dns sink it will block browser-intake-datadoghq.com because it's on the standard issue blacklist of domains. We use DD RUM and are on a version with this bug. The downside for the user is it is making 10000's of network calls and will drain their battery. Thanks for fixing it.

bensautner-comcast avatar Dec 06 '24 18:12 bensautner-comcast

Hi @bensautner-comcast! Can you please share the version of the SDK you are using? We did some changes to the network stack in the recent versions.

0xnm avatar Dec 09 '24 08:12 0xnm

Hi @bensautner-comcast! Can you please share the version of the SDK you are using? We did some changes to the network stack in the recent versions.

Hi @0xnm - we're on 2.2.0 and I opened a ticket internally here to update to the latest. This was a mystery for some of the android developers who happened to have an ad blocker, privacy app or used a privacy DNS service. I noticed it working out of my home office with a pi hole DNS and exponentially increasing CONNECT calls with 503s in my debug proxy and stack traces in logcat from an okhttp retry interceptor called RetryAndFolllowUpInterceptor.

This led me to these issues - it's easy to reproduce, let me know if you'd like more info

bensautner-comcast avatar Dec 09 '24 12:12 bensautner-comcast

2.2.0 is 1 year old SDK version, so it is totally worth to update to the latest SDK version and check if issue with excessive network calls still persists. SDK will retry the request, increasing retry delay with each failed attempt.

0xnm avatar Dec 09 '24 13:12 0xnm

DNS servers with adblocker, malwareblocker or which for the parental control block are sometimes stubborn and only allow unknown websites after some time.

tdierks28 avatar May 15 '25 07:05 tdierks28