Support disabled-by-default metrics using an advisory parameter.
What are you trying to achieve?
As an instrumentation author, I would like to record metrics that are disabled by default. Today, I need to add configuration to my instrumentation library to support enabling or disabling metrics.
As an end-user, I would like to be able to use a View (or other similar mechanism) to enable default-disabled metrics--ideally in a way that is compatible with the file-based SDK configuration so I can enable such metrics at runtime.
Proposed solution
I propose adding a new advisory parameter to the metrics Instrument API: "DefaultDisabled" (naming depends on the outcome of https://github.com/open-telemetry/opentelemetry-specification/issues/4344). If this parameter is provided on instrument creation, the SDK will use the DropAggregation by default instead of the standard default.
A user could enable this metric by changing the Aggregation of the metric stream from DropAggregation to the DefaultAggregation.
This is very similar to the approach taken by the Attributes advisory parameter, in that it allows instrumentation authors to provide verbose telemetry using the instrument API, but specify a different default for how it is aggregated on instrument creation.
Alternatives considered
- We could take a more general approach, and support a "DefaultAggregation" advisory parameter. This would allow something like measuring using histogram instrument, but defaulting to aggregating as a gauge. I'm not convinced those kinds of use-cases exist, though.
Additional context.
This came up in the context of https://github.com/open-telemetry/opentelemetry-go-contrib/issues/6321#issuecomment-2625351686.
cc @MrAlias
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#meterconfigurator Could the instrumentation's Meter (Scope) name be used to disable it by the end user? (No need of Views)
(Agree this is not something the instrumentation author can control.)
I really like the idea of having a standard way to implement disabled-by-default metrics.
We have several of these in Java instrumentation, and each one currently needs to have its own config property for users to be able to enable it.
This would be really helpful for the Collector as well, where we want to have some mechanism for components to report what metrics should be disabled with the default level (see open-telemetry/opentelemetry-collector/issues/11754)
We (@svrnm and @mx-psi in triage meeting) think this is a good idea. @MrAlias, as spec sponsor would you be willing to sponsor this?
What is the plan on how DefaultDisabled would interact with MeterConfig and MeterConfigurator? Would MeterConfig.disabled be set to DefaultDisabled if the MeterConfigurator returns "some signal indicating that the default MeterConfig should be used"?
@jade-guiton-dd this issue is about providing instrument-level configuration similar to other advisory parameters. If the Meter is disabled, no metrics will be collected, regardless of this setting. If the meter is enabled, only then would this (and other instrument-level) configuration apply.
I really like this idea.
I wish we could have a thing that supports more than 2 states so it could evolve. Using logging severity as analogy, I wish we had a config like
enum Importance:
High,
Medium
Low,
or
enum DefaultStatus # needs a better name
Enabled,
Disabled,
# we could add things in the future, but it'd still be limiting
We haven't even started to figure it out - https://github.com/open-telemetry/opentelemetry-specification/issues/3205, so I think we'd have to start with a boolean flag anyway.
A separate note: there were discussions about double negation recently #4344 #4351 and I think it'd be better to avoid using it here. I.e. instead of DefaultDisabled I'm suggesting DefaultStatus = Enabled or DefaultEnabled = true
@lmolkova Interesting idea. I wonder if that should be a higher-level configuration concept, since we already have knobs for selecting which attributes are disabled by default. I would imagine that getting "detailed" metrics would enable both default-disabled metrics, and default-disabled attributes. Maybe it should even change the aggregation for some metrics (e.g. it is a counter when getting "basic" metrics, but a histogram when getting "detailed" metrics)?
I do worry that giving instrumentation authors a log-level-like knob will lead to the same inconsistencies we see with logging, where everyone has a different definition of "Info", and libraries use it inconsistently, which requires us to add additional config knobs to allow users to fix. If there was some way for us to objectively define levels for metrics (e.g. "detailed" means enable all metrics, and "debug" means enable all metrics and all attributes) which could be layered on top of SDK config, that seems better to me than handing instrumentation authors a knob similar to log-level.
So I think the simpler option i'm proposing above is still the right first step.
I started working on a prototype of this, but quickly realized that Aggregation is an SDK concept, and thus can't be used in the API without more invasive changes. Instead, I think we should just add an advisory parameter to disable a metric by default.
PoC: https://github.com/open-telemetry/opentelemetry-go/pull/7721