opentelemetry-js
opentelemetry-js copied to clipboard
Handling of cross-origin request network timings
What happened?
Steps to Reproduce
When a cross-origin request is done (eg. example.com -> resources.vendor.com, but also example.com -> static.example.com or even localhost:3000(a webpack server) -> localhost:8080 (API server)), the PerformanceResourceTiming generated for these doesn't include most of the values by default
Note: When CORS is in effect, many of these values are returned as zero unless the server's access policy permits these values to be shared. This requires the server providing the resource to send the
Timing-Allow-OriginHTTP response header with a value specifying the origin or origins which are allowed to get the restricted timestamp values.The properties which are returned as 0 by default when loading a resource from a domain other than the one of the web page itself:
redirectStart,redirectEnd,domainLookupStart,domainLookupEnd,connectStart,connectEnd,secureConnectionStart,requestStart, andresponseStart.
Currently otel/sdk-trace-web addSpanNetworkEvents doesn't have any checks for this, which means 0 values get passed to span.addEvent, where timeorigin (time at which page loaded) gets added, causing times that look normal but are usually before span start:
documentLoad
span start time: 1661419670581.9
fetchStart: 1661419670581.9
unloadEventStart: 1661419670378.3
unloadEventEnd: 1661419670378.3
domInteractive: 1661419671104.4
resourceFetch (to another subdomain)
span start: 1661419671106.4
fetchStart: 1661419671106.4
domainLookupStart: 1661419670378.3
domainLookupEnd: 1661419670378.3
connectStart: 1661419670378.3
secureConnectionStart: 1661419670378.3
connectEnd: 1661419670378.3
requestStart: 1661419670378.3
responseStart: 1661419670378.3
responseEnd: 1661419671153.8
This sort of data isn't easy to filter out when processing is done on span level (there's no info of what is 0 value in browser), meaning simplest way to generate the request phases chart (eg. responseEnd - responseStart to calculate response duration) generating longer timings than makes sense:
Expected Behavior
Don't include values that are known to be unavailable in this case, so analysis tools could more easily handle this case differently (This might be considered a breaking change that people would be against? so I decided to first open this issue to suggest this instead of going straight to PR)
If there's too much resistance for changing current behaviour this could also be something speced out during RUM SIG and fix implemented when instrumentation follows those specs (kinda touching #3174), but as user of current instrumentation in signalfx/splunk would rather fix the current situation
This looks helpful i support this change. Is there any process currently for introducing breaking changes? For example, this change could be based on an InstrumentationConfig option and you could give notice to folks via changeling/ release notes of the npm package.
Is there any process currently for introducing breaking changes?
I don't think there is any, as long as we have a good reason to make the breaking change. I believe if the current 0-valued events are meaningless in CORS requests, it should be ok for us to not generate these events -- they are not working as expected anyways.