opentelemetry-js
opentelemetry-js copied to clipboard
In the browser environment, if there are many requests in the page, and the chrome tag is frequently switched back and forth, the sendBeacon request will appear in pending status, and the console promise will report an error
What happened?
Steps to Reproduce
Visit a page with more than 200 requests, switch to other tabs when sending to the server, and then switch back quickly
Expected Result
Actual Result
The data can be sent successfully, and there is no promise error on the console
Additional Details

OpenTelemetry Setup Code
import { WebTracerProvider } from '@opentelemetry/sdk-trace-web'
import { Resource } from '@opentelemetry/resources'
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions'
import { ZoneContextManager } from '@opentelemetry/context-zone'
import { registerInstrumentations } from '@opentelemetry/instrumentation'
import { FetchInstrumentation } from '@opentelemetry/instrumentation-fetch'
import { XMLHttpRequestInstrumentation } from '@opentelemetry/instrumentation-xml-http-request'
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { CustomIdGenerator } from '../utils/generator'
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
.
.
.
provider.register({
contextManager: new ZoneContextManager(),
})
const collectorOptions = {
url: 'xxx/api/traces', // url is optional and can be omitted - default is
};
const exporter = new OTLPTraceExporter(collectorOptions);
const spanProcess = new BatchSpanProcessor(exporter, {
// The maximum queue size. After the size is reached spans are dropped.
maxQueueSize: 100,
// The maximum batch size of every export. It must be smaller or equal to maxQueueSize.
maxExportBatchSize: 10,
// The interval between two consecutive exports
scheduledDelayMillis: 2000,
// How long the export can run before it is cancelled
exportTimeoutMillis: 30000,
})
provider.addSpanProcessor(spanProcess);
package.json
"@opentelemetry/api": "^1.3.0",
"@opentelemetry/context-zone": "^1.8.0",
"@opentelemetry/exporter-trace-otlp-http": "^0.34.0",
"@opentelemetry/instrumentation": "^0.34.0",
"@opentelemetry/instrumentation-fetch": "^0.34.0",
"@opentelemetry/instrumentation-xml-http-request": "^0.34.0",
"@opentelemetry/otlp-transformer": "^0.34.0",
"@opentelemetry/resources": "^1.8.0",
"@opentelemetry/sdk-trace-base": "^1.8.0",
"@opentelemetry/sdk-trace-web": "^1.8.0",
"@opentelemetry/semantic-conventions": "^1.8.0",
Relevant log output
No response
I'm sorry it isn't really clear from your report, but is the data actually sent successfully or not? Is the bug just the log statement? Can you include the stacktrace so I can try to track down where this is actually coming from?
I'm sorry it isn't really clear from your report, but is the data actually sent successfully or not? Is the bug just the log statement? Can you include the stacktrace so I can try to track down where this is actually coming from?
The data is in pending status and there are many data. It has not been verified whether the transmission is successful. However, the console will throw a promise error, which will affect the indicators of exception collection. Most span data comes from xhr and fetch instrumentation. There is stack information, which is currently located here: https://github.com/open-telemetry/opentelemetry-js/blob/2fb80ebe19c7fb16704fb39e3df1faa4845a9f8c/experimental/packages/otlp-exporter-base/src/platform/browser/util.ts#L38
It may be difficult to troubleshoot this problem only through what I have described. Because in general, users will not switch labels violently like me, even if The length of the '_finishedSpans' is very long, and they will also be queued for sending. However, if you switch to another tag when sending, I guess that the 'visibilitychange' or 'pagehide' is triggered, which requires simultaneous execution 'flushAll' function. You can simulate a scene and let ' The maxQueueSize 'should be as close to the maximum value as possible. Then, when sending data, switch to other pages, and then switch back quickly. This operation should be repeated several times, and then observe the console

The bug here is most likely in the simultaneous execution of the flush then. I believe the spec is that it should not be executed concurrently.
Labeling as p2 since it doesn't seem to be a crasher or affect your application (please correct me if that's wrong) but may be missing telemetry to the backend.
We are seeing the same error, and the stacktrace suggests this line navigator.sendBeacon() is returning false here and causing an unhandled promise to bubble up ultimately.
My guess is the data has exceeded the browser limit for queueing the sendBeacon request (https://w3c.github.io/beacon/#return-value):
If the amount of data to be queued exceeds the user agent limit (as defined in HTTP-network-or-cache fetch), this method returns false
BTW we use the BatchSpanProcessor with the following config:
{
maxQueueSize: 100,
maxExportBatchSize: 10,
scheduledDelayMillis: 500,
exportTimeoutMillis: 30000,
}
If this theory sounds reasonable, we will try reducing the maxQueueSize, maxExportBatchSize and scheduledDelayMillis to see if it helps.
On the other hand, I am not sure if the fact that the SDK emitting unhandled promises is acceptable by the OTEL development guidance or not: https://opentelemetry.io/docs/reference/specification/error-handling/
And finally, is there a way for users to configure error handlers for these errors?
Thanks
I believe we are also seeing this bug in production. In the interest of not just registering a "me too" comment, we can add the following details:
- We see this with the metrics SDK, rather than traces
- We also see stacktraces pointing to the same code
We are seeing around 100 per day, which is a rather small fraction of the metrics requests we are handling; so I do not believe it to be extraordinarily urgent. But, we do see it.
We are seeing the same error, and the stacktrace suggests this line
navigator.sendBeacon()is returningfalsehere and causing an unhandled promise to bubble up ultimately.My guess is the
datahas exceeded the browser limit for queueing thesendBeaconrequest (https://w3c.github.io/beacon/#return-value):If the amount of data to be queued exceeds the user agent limit (as defined in HTTP-network-or-cache fetch), this method returns false
BTW we use the
BatchSpanProcessorwith the following config:{ maxQueueSize: 100, maxExportBatchSize: 10, scheduledDelayMillis: 500, exportTimeoutMillis: 30000, }If this theory sounds reasonable, we will try reducing the
maxQueueSize,maxExportBatchSizeandscheduledDelayMillisto see if it helps.
Yes this seems reasonable. I'm not sure from the wording there if a single batch is too big or if the entire beacon queue as grown too large. If the first, then only changing maxExportBatchSize should suffice.
On the other hand, I am not sure if the fact that the SDK emitting unhandled promises is acceptable by the OTEL development guidance or not: https://opentelemetry.io/docs/reference/specification/error-handling/
An unhandled promise rejection is equivalent to an unhandled throw and it is not acceptable. This is a bug. Thanks for the additional info and troubleshooting.
And finally, is there a way for users to configure error handlers for these errors?
There is a global error handler but if we aren't catching the error then we can't send it to the global error handler. You may also be able to use the window unhandledrejection event to log them if you want.
I'm changing this to p1 since it is throwing unhandled promise rejections
This is also an issue on the "receiving" server.
When a (recent) browser (older ones work differently) sends a sendBeacon request to it "expects" that the receiving server will NOT return a response and therefore it (should) return a 204 (with no body), this enables the browser to "reuse" the existing connection. If the server sends back a 200 with a response the browser (chrome) will cause the current connection to be closed resulting in the need to establish a new connection to the receiving server.
This issue is compounded when the "sender" uses multiple sendBeacon request in a small amount of time as this causes browser to create / establish multiple connections to the same domain resulting in "many" end up in the pending state and often never getting sent.
As mentioned above older versions of Chromium (Chrome, Edge) and specifically original Microsoft Edge (non-chromium version) handle the "200" response without issues and can actually cause other issues if many 204's are returned. For Microsoft's internal SDK we have a configuration which enables the browser instance to "detect" these environments and it works with the backend by supplying a query string to "inform" the receiving server that the runtime is expecting a 204 for the current request (assuming all is OK, ie. it would have normally returned a 200)
It seems to me that the traces should now be sent using the fetch api with the keepalive parameter instead of the Beacon API (in browsers that support it at least, which is pretty much everyone except FireFox):
The keepalive option can be used to allow the request to outlive the page. Fetch with the keepalive flag is a replacement for the Navigator.sendBeacon() API.
fetch api with the keepalive
The fetch with keep-alive flag also has the same 64kb limitation, so if this is payload related it's still going to be an issue (with getting the telemetry out of the environment)