designcourse icon indicating copy to clipboard operation
designcourse copied to clipboard

Discrepancy between Chrome DevTools and DataDog RUM response time metric for Service Worker responses

Open eliranamar opened this issue 9 months ago • 2 comments
trafficstars

We try to achieve faster response time for english translation files by using service workers fetch request hijack.

We are observing a significant discrepancy in response time measurements for certain requests between Chrome DevTools and DataDog RUM. Our implementation uses a service worker to handle i18n translation files. For the English version (en-US), our service worker returns a synthesized empty JSON object, resulting in nearly instantaneous fetch times (∼1ms) as observed in Chrome DevTools' Network tab.

However, in DataDog RUM explorer, these same responses are being reported with latencies around 60ms. We believe the additional time may be due to aspects such as Instrumentation or processing delays captured by the RUM SDK that are not visible in the network tab

In general:

  1. The service worker intercepts a request for an en-US JSON translation file.
  2. The code path in the service worker immediately returns an empty JSON object ({}).
  3. Chrome DevTools Network tab reports a response time of around 1ms.
  4. DataDog RUM explorer, however, records the same request with a response time of approximately 60ms for the 75th percentile.

Additional details:

  • npm version of @datadog/browser-rum: 6.2.1
  • We are sure that the recorded response in from the service worker as we added a unique response header from the SW that we can filter in RUM explorer.
  • To test our assumptions we compared results for specific user sessions manually in devtools and datado

Questions:

  1. Could you help explain what additional processing or events DataDog RUM might be capturing that results in the higher timing (∼60ms) compared to the nearly instantaneous response observed in the network tab?
  2. Are there any known issues or configuration settings within DataDog RUM that may affect measurements for responses handled by service workers?
  3. Do you have any recommendations or best practices for aligning the RUM measurements more closely with the actual network and processing times in such scenarios?

Screenshots: Fetch request example: Image

Image


Datadog same SW request distribution: Image

eliranamar avatar Feb 19 '25 10:02 eliranamar

Thank you for the thorough report.

Investigation

I experimented a bit, and I reproduce your issue (although not as pronounced as your screenshots) when the response served by the Service Worker is big enough.

I think this can be explained because, in order to get the full duration of the request (including download), we actually read the whole body, see waitForResponseToComplete and readBytesFromStream, and reading the response is not included in timings shown in Chome devtools. This is unfortunate but there is no way to wait for the download to complete without reading from it based on the Fetch API alone.

This issue is agravated by https://github.com/DataDog/browser-sdk/issues/2566 : Usually, when no service worker is involved, we get the duration (and other timings) from the Performance Timing API, which should match exactly what is shown in the Chrome devtools. Unfortunately, when a service worker is involved, we are currently failing to use this strategy (due to a Chromium issue). When we'll work around this issue, we should report the exact timing as reported by the Browser instead of computing our own, which will solve what you are experiencing.

Answers (to recap!)

Could you help explain what additional processing or events DataDog RUM might be capturing that results in the higher timing (∼60ms) compared to the nearly instantaneous response observed in the network tab?

This is probably caused by the time it take to read the response (alocate memory...), which is not shown in devtools. This dicrepancy might be more pronounced if the main thread is busy.

Are there any known issues or configuration settings within DataDog RUM that may affect measurements for responses handled by service workers?

Yes https://github.com/DataDog/browser-sdk/issues/2566

Do you have any recommendations or best practices for aligning the RUM measurements more closely with the actual network and processing times in such scenarios?

What you could do (until we fix the known issue mentioned above) is to report the duration by your own means, maybe using a custom vital. It could look like this:


new PerformanceObserver((list) => {
  for (let entry of list.getEntries()) {
    if (entry.name.includes("/i18n/")) {
      window.DD_RUM.addDurationVital("Translations fetch", {
        startTime: performance.timeOrigin + entry.startTime,
        duration: entry.duration,
        description: entry.name,
      });
    }
  }
}).observe({ entryTypes: ["resource"], buffered: true });

Side note

To help investigating your issue, you could use our Developer Extension. It will allow to compare Chrome timings with what's reported by the RUM Browser SDK locally, without having to reach Datadog.

BenoitZugmeyer avatar Feb 20 '25 18:02 BenoitZugmeyer

Thank you for this thorough and detailed explanation! I'll implement the suggested PerformanceObserver solution and test it in our environment.

Regarding the Developer Extension - I noticed it doesn't show Service Worker requests, but I'll still use it for other things.

I'll keep you posted on the results and whether the PerformanceObserver solution resolves our timing discrepancy. Thanks again for your help!

eliranamar avatar Feb 26 '25 09:02 eliranamar

Hello @eliranamar, I’m closing this card because no further action is needed on our side. Don’t hesitate to reopen it if the PerformanceObserver solution doesn’t work.

amortemousque avatar May 28 '25 12:05 amortemousque