opentelemetry-js
opentelemetry-js copied to clipboard
Single-shot metrics
- [ ] This only affects the JavaScript OpenTelemetry library
- [x] This may affect other libraries, but I would like to get opinions here first
I've been looking into how to export single-shot metrics but don't really see how. The issue with PeriodicExportingMetricReader is that it keeps sending metrics with same value even if there's no new records.
For example consider a use case where I want to emit timestamp each time user clicks mouse button. I wouldn't want that it's sent if user didn't made any clicks. But using PeriodicExportingMetricReader that's not possible.
meterProvider.addMetricReader(new PeriodicExportingMetricReader({
exporter: new OTLPMetricExporter({
url: 'https://otlp/v1/metrics'
}),
}));
meterProvider.getMeter('default').createHistogram(metricName).record(clickTimestamp);
There are 2 different things:
- when the metric is recorded/sampled - eg. measure every 10 seconds
- when metrics is sent to endpoint - eg. every 1min send all collected metrics to endpoint
and it seems current implementation does both at same time but there should be way to do 1. point manually that would allow for single-shot metrics.
Have you considered AggregationTemporality.DELTA? With that configuration, the exported metric aggregation will no longer be exported again. It can be configured with the constructor parameter OTLPMetricExporterOptions of OTLPMetricExporter, like:
new OTLPMetricExporter({
url: 'https://otlp/v1/metrics'
temporalityPreference: AggregationTemporality.DELTA,
});
Hope this helps.
It solves the issue that same data isn't resent again but it still keeps sending metrics all the time to the endpont with empty dataPoints
{
name: "...",
"histogram": {
"aggregationTemporality": 1,
"dataPoints": []
}
}
To me it looks like there is need for MetricReader that sends out metrics only when they become available and not sending all the time when there haven't been any new metrics emitted.
We have an issue tracking the ability to forget unused attributes: https://github.com/open-telemetry/opentelemetry-js/issues/2997. But there is no method to unregister an instrument yet -- as creating a new instrument is not as trivial as recording a metric event is, I would not recommend doing that in a repetitive way.
I don't think that would be good way - add metric, send it, then remove it. Especially it doesn't make sense to remove it when you know you'll send it again later just not periodically.
Do you suggest that we should not forget unused attributes or unregister instruments?
I think he's just saying that he wants to be able to export a metric only when a value is provided without unregistering. Right now even with delta temporality we export an empty metric on each export interval. One option might e to detect when this is happening and simply not export which wouldn't require unregistering.
The OTLP exporter is welcome to skip exporting metrics with no data points, I feel. Does this need to be specified? I also feel that Delta temporality is the correct solution to ensure the desired behavior.
Lastly, in the Otel-Go API we avoided the verb "create" so that the expression for a single-shot metric would look more natural, e.g.,
meter.SyncFloat64().Histogram("histo").Record(ctx, 100)
@jmacd The OTLP exporter is welcome to skip exporting metrics with no data points, I feel. Does this need to be specified?
It would be definitely helpful to explicitly specify this.
Lastly, in the Otel-Go API we avoided the verb "create" so that the expression for a single-shot metric would look more natural,
Well... that's a good point. In the API spec it uses the term "create", I find that most language implementations adopted the term "create" or similar terms as well.
- https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/api.md#meter-operations
- Java: "builder" https://github.com/open-telemetry/opentelemetry-java/blob/main/api/all/src/main/java/io/opentelemetry/api/metrics/Meter.java#L75
- dotnet: "create" https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/docs/metrics/getting-started/Program.cs#L26 (I failed to find the definition other than this example)
- python: "create" https://github.com/open-telemetry/opentelemetry-python/blob/main/opentelemetry-api/src/opentelemetry/metrics/_internal/init.py#L239
The requested OTLP clarification is in https://github.com/open-telemetry/opentelemetry-specification/issues/2715.
Given that we support duplicate instrument registration, it's not clear that calling "create" will actually create anything. However, I think the use of "create" is still going to be idiomatic in some languages. We decided not to use "create" in the Golang context because Go has a style guideline to avoid superfluous prefixes, e.g., to avoid the "Get" prefix in the accessor "GetXXX()" and prefer just "XXX()".
So, you should be able to have single-shot metrics using the create() method over and over. The requirements for duplicate instrument registration allow this and you'll have no warnings as long as the repeat definitions are the same.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.
Wanted to provide some additional information on some real world examples of how this issue plays out, in hopes of increasing priority of this issue.
-
The client SDK in the browser ships the empty data points for delta temporality metrics for every series. In a real world production app this can be many series, and as a result, will cause the queue logic to think it needs to export metrics faster than it normally would (more network requests). This also results in higher resource consumption on client devices due to the additional data throughput. This can be an issue with cellular data plans.
-
The Collector (at least as of version 75) drops the empty data points from export. However, there are a couple of side effects here. With reduced logging in production we still see log spam for these errors on export. Additionally, the collector still queues up these metrics only to fail exporting them, which can cause
batchto fill up faster than it otherwise would. Lastly, this may be my own confusion around the OTC app/host metrics, but it's incredibly difficult to identify "real" problems when logs are overly verbose and batch timeout metrics don't determine the root cause.
Jfyi, the python SDK also has this ability. It's unofficial, but it works. Basically setting reader to infinity to stop collecting periodically and calling collect yourself. See this PR for discussion: https://github.com/open-telemetry/opentelemetry-python/pull/3059 I had the use case in a CLI application, and tried it out here currently (hacky thing just to see if it works (it does): https://github.com/DGuhr/cli/tree/otel_integration_hack