opentelemetry-js
opentelemetry-js copied to clipboard
Instrumentation Double-Init / Caching Bug w/ Prometheus Exporter
What happened?
This is a bit hard for me to extract our exact setup, but roughly following the docs for the autoloader:
const provider = new MeterProvider();
provider.addMetricReader(new PrometheusExporter());
api.metrics.setGlobalMeterProvider(provider);
registerInstrumentations({
instrumentations: [
new HttpInstrumentation(),
],
});
When we run the following, prometheus will end up exporting the http metrics twice:
# HELP http_server_duration measures the duration of the inbound HTTP requests
# UNIT http_server_duration ms
# TYPE http_server_duration histogram
# HELP http_server_duration measures the duration of the inbound HTTP requests
# UNIT http_server_duration ms
# TYPE http_server_duration histogram
# HELP http_client_duration measures the duration of the outbound HTTP requests
# UNIT http_client_duration ms
# TYPE http_client_duration histogram
# HELP http_client_duration measures the duration of the outbound HTTP requests
# UNIT http_client_duration ms
# TYPE http_client_duration histogram
I added some debug code to the exporter:
for (let metric of metrics.scopeMetrics) {
console.log(`scope: ${metric.name}`)
for (const data of metric.metrics) {
console.log(data.descriptor)
}
and I see the following logs:
scope: @opentelemetry/instrumentation-http
{
name: 'http.server.duration',
type: 'HISTOGRAM',
description: 'measures the duration of the inbound HTTP requests',
unit: 'ms',
valueType: 1
}
{
name: 'http.server.duration',
type: 'HISTOGRAM',
description: 'measures the duration of the inbound HTTP requests',
unit: 'ms',
valueType: 1
}
{
name: 'http.client.duration',
type: 'HISTOGRAM',
description: 'measures the duration of the outbound HTTP requests',
unit: 'ms',
valueType: 1
}
{
name: 'http.client.duration',
type: 'HISTOGRAM',
description: 'measures the duration of the outbound HTTP requests',
unit: 'ms',
valueType: 1
}
I think this is because the Instrumentation constructor inits the metrics first, then the call to registerInstrumentation passes in the same meterProvider to enableInstrumentations, triggering the instrumentation to re-create the metrics against the original meter/meterProvider.
Not sure what the "bug" is here, guessing it's either:
- Creating the metrics in the constructor, then re-initing with the same meter a little later in the autoloader
- Not caching/returning the cached metric when the meter hasn't changed
- Not de-duping these metrics in the prometheus exporter
OpenTelemetry Setup Code
No response
package.json
"@opentelemetry/api": "^1.3.0",
"@opentelemetry/exporter-prometheus": "^0.34.0",
"@opentelemetry/instrumentation-http": "^0.34.0",
"@opentelemetry/sdk-metrics": "^1.8.0",
Relevant log output
No response
Could also be user error / us looking at outdated docs -- we've since stopped calling registerInstrumentation, construction of the plugin seems to be sufficient to monkeypatch.
Assigned @hectorhdzg as he wrote the http metric instrumentation
@dyladan I think this would affect anything that extends InstrumentationAbstract, not necessarily specific to the http instrumentation
We just don't enable much else by default to have noticed/verified it affects others.
Currently we call _updateMetricInstruments whenever an "Instrumentation" class is instantiated, this method will create the "Metric Instruments" through a Meter obtained using the current MeterProvider, so if the "Instrumentation" is enabled through config(enabled by default) when instantiated then it will be fully functional, registerInstrumentations will update the MeterProvider in this case using exact same one that was being used before and the "Instrumentation" will create the "Metrics Instruments" again, I believe the correct fix here would be to add extra logic in the Instrumentation class when setMeterProvider is called, to check if the Meter is different or not, and only create the "Metric Instruments" when is something different. @legendecas @dyladan thoughts?, I can implement the fix just getting some consensus on what the correct fix would be
check if the Meter is different or not, and only create the "Metric Instruments" when is something different
This would get my vote, though it does mean the API would still allow end-users to double-create metric instruments on their own vs. caching and returning the same metric instrument from the meter.
Would you mind sharing the exact metrics output that you think contains duplicated metric records? I can not reproduce the problem with the following setup:
const { MeterProvider } = require('@opentelemetry/sdk-metrics');
const api = require('@opentelemetry/api');
const { PrometheusExporter } = require('@opentelemetry/exporter-prometheus');
const { HttpInstrumentation } = require('@opentelemetry/instrumentation-http');
const { registerInstrumentations } = require('@opentelemetry/instrumentation');
api.diag.setLogger(new api.DiagConsoleLogger(), { logLevel: api.DiagLogLevel.ALL });
const provider = new MeterProvider();
provider.addMetricReader(new PrometheusExporter());
api.metrics.setGlobalMeterProvider(provider);
registerInstrumentations({
instrumentations: [
new HttpInstrumentation(),
],
});
const http = require('http');
http.createServer((req, res) => {
res.end('hello');
}).listen(1234);
{
"dependencies": {
"@opentelemetry/api": "^1.3.0",
"@opentelemetry/exporter-prometheus": "^0.34.0",
"@opentelemetry/instrumentation": "^0.34.0",
"@opentelemetry/instrumentation-http": "^0.34.0",
"@opentelemetry/sdk-metrics": "^1.8.0"
}
}
Although "same metric names" can exist in the output when you request http://localhost:9464/metrics, they are not "identical" metric records since their attributes (or labels in prometheus) are not the same.
Duplicated metric instrument registrations are not created multiple times. Instead, the returned metric instruments are sharing the same metric storages (https://github.com/open-telemetry/opentelemetry-js/blob/main/packages/sdk-metrics/src/state/MeterSharedState.ts#L128). So I don't find this is a problem with _updateMetricInstruments.
@legendecas thank you for the base code, this should show the duplicate metrics/series:
const { MeterProvider } = require('@opentelemetry/sdk-metrics');
const api = require('@opentelemetry/api');
const { PrometheusExporter } = require('@opentelemetry/exporter-prometheus');
const { HttpInstrumentation } = require('@opentelemetry/instrumentation-http');
const { registerInstrumentations } = require('@opentelemetry/instrumentation');
api.diag.setLogger(new api.DiagConsoleLogger(), { logLevel: api.DiagLogLevel.ALL });
const provider = new MeterProvider();
const exporter = new PrometheusExporter();
const old = exporter._exportMetrics
exporter._exportMetrics = (response) => {
exporter.collect().then((collectionResult) => {
const { resourceMetrics, errors } = collectionResult;
for (const metric of resourceMetrics.scopeMetrics) {
console.log(`scope: ${metric.scope.name}`);
console.log(metric);
for (const series of metric.metrics) {
console.log(series.descriptor);
}
}
})
old(response);
}
provider.addMetricReader(exporter);
api.metrics.setGlobalMeterProvider(provider);
registerInstrumentations({
instrumentations: [
new HttpInstrumentation(),
],
});
const http = require('http');
http.createServer((req, res) => {
res.end('hello');
}).listen(1234);
Running this I see:
scope: @opentelemetry/instrumentation-http
{
scope: {
name: '@opentelemetry/instrumentation-http',
version: '0.34.0',
schemaUrl: undefined
},
metrics: [
{
descriptor: [Object],
aggregationTemporality: 1,
dataPointType: 0,
dataPoints: []
},
{
descriptor: [Object],
aggregationTemporality: 1,
dataPointType: 0,
dataPoints: []
},
{
descriptor: [Object],
aggregationTemporality: 1,
dataPointType: 0,
dataPoints: []
},
{
descriptor: [Object],
aggregationTemporality: 1,
dataPointType: 0,
dataPoints: []
}
]
}
{
name: 'http.server.duration',
type: 'HISTOGRAM',
description: 'measures the duration of the inbound HTTP requests',
unit: 'ms',
valueType: 1
}
{
name: 'http.server.duration',
type: 'HISTOGRAM',
description: 'measures the duration of the inbound HTTP requests',
unit: 'ms',
valueType: 1
}
{
name: 'http.client.duration',
type: 'HISTOGRAM',
description: 'measures the duration of the outbound HTTP requests',
unit: 'ms',
valueType: 1
}
{
name: 'http.client.duration',
type: 'HISTOGRAM',
description: 'measures the duration of the outbound HTTP requests',
unit: 'ms',
valueType: 1
}
As far as I can tell, all those histograms are the same labels. Run the following (removes the call to registerInstruments):
const { MeterProvider } = require('@opentelemetry/sdk-metrics');
const api = require('@opentelemetry/api');
const { PrometheusExporter } = require('@opentelemetry/exporter-prometheus');
const { HttpInstrumentation } = require('@opentelemetry/instrumentation-http');
const { registerInstrumentations } = require('@opentelemetry/instrumentation');
api.diag.setLogger(new api.DiagConsoleLogger(), { logLevel: api.DiagLogLevel.ALL });
const provider = new MeterProvider();
const exporter = new PrometheusExporter();
const old = exporter._exportMetrics
exporter._exportMetrics = (response) => {
exporter.collect().then((collectionResult) => {
const { resourceMetrics, errors } = collectionResult;
for (const metric of resourceMetrics.scopeMetrics) {
console.log(`scope: ${metric.scope.name}`);
console.log(metric);
for (const series of metric.metrics) {
console.log(series.descriptor);
}
}
})
old(response);
}
provider.addMetricReader(exporter);
api.metrics.setGlobalMeterProvider(provider);
new HttpInstrumentation()
const http = require('http');
http.createServer((req, res) => {
res.end('hello');
}).listen(1234);
and I see:
scope: @opentelemetry/instrumentation-http
{
scope: {
name: '@opentelemetry/instrumentation-http',
version: '0.34.0',
schemaUrl: undefined
},
metrics: [
{
descriptor: [Object],
aggregationTemporality: 1,
dataPointType: 0,
dataPoints: [Array]
},
{
descriptor: [Object],
aggregationTemporality: 1,
dataPointType: 0,
dataPoints: []
}
]
}
{
name: 'http.server.duration',
type: 'HISTOGRAM',
description: 'measures the duration of the inbound HTTP requests',
unit: 'ms',
valueType: 1
}
{
name: 'http.client.duration',
type: 'HISTOGRAM',
description: 'measures the duration of the outbound HTTP requests',
unit: 'ms',
valueType: 1
}
To be clear, the individual data points aren't duplicated, but in the case where the HTTP instrumentation produces 4 metrics under the scoped metrics, 2 of those metrics will never have any data points, because the instrument threw away the metrics and re-created them in _updateMetricInstruments
Although "same metric names" can exist in the output when you request http://localhost:9464/metrics, they are not "identical" metric records since their attributes (or labels in prometheus) are not the same.
In the example below (working properly) each line has different labels, correct:
# HELP http_server_duration measures the duration of the inbound HTTP requests
# UNIT http_server_duration ms
# TYPE http_server_duration histogram
http_server_duration_count{http_scheme="http",http_method="GET",net_host_name="localhost",http_flavor="1.1",http_status_code="200",net_host_port="1234"} 2 1670974575998
http_server_duration_sum{http_scheme="http",http_method="GET",net_host_name="localhost",http_flavor="1.1",http_status_code="200",net_host_port="1234"} 9 1670974575998
In the case where we call registerInstruments, we get the # HELP, #UNIT and #TYPE lines twice, which is the duplicate I was referencing.
# HELP http_server_duration measures the duration of the inbound HTTP requests
# UNIT http_server_duration ms
# TYPE http_server_duration histogram
<!-- Prometheus reads this as an empty metric -->
# HELP http_server_duration measures the duration of the inbound HTTP requests
# UNIT http_server_duration ms
# TYPE http_server_duration histogram
http_server_duration_count{http_scheme="http",http_method="GET",net_host_name="localhost",http_flavor="1.1",http_status_code="200",net_host_port="1234"} 2 1670974575998
http_server_duration_sum{http_scheme="http",http_method="GET",net_host_name="localhost",http_flavor="1.1",http_status_code="200",net_host_port="1234"} 9 1670974575998