opentelemetry-collector icon indicating copy to clipboard operation
opentelemetry-collector copied to clipboard

[cmd/mdatagen] Ability to disable/enable all metrics

Open dmitryax opened this issue 1 year ago • 21 comments

Problem

We have a few issues when it's not desirable to configure the emitted metrics and resources starting from the "default" set. Sometimes, it's needed to start with an empty set and enable just a few metrics or start with enabling all metrics/resources (default+optional) and explicitly disabling a few of them.

This is needed at least for:

  • https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/28832
  • https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/30894

Possible solutions

Given that the metrics and resource_attributes interfaces provide a map name: config, we need to introduce special metric names that can be applied to a group of metrics/resources.

Option 1

Have a predefined special metric name that cannot collide with existing names, e.g., _ALL.

metrics:
  _ALL:
    enabled: false
  metric1:
    enabled: true

Option 2

Support glob-style * in metric names. Using * in metric/attribute names isn't valid according to the OTel specification, so it cannot cause collisions. Providing this capability makes the user interface more powerful, as users will be able to disable/enable groups of metrics by their namespaces.

metrics:
  '*':
     enabled: false
  metric1:
    enabled: true

Providing this interface can cause collisions if the several globs match the same metric (e.g. * and system.* match any system.* metric). Once such collisions are detected, the config validation should fail because there is no way to clearly define precedence with a map.

This approach requires an evaluation from the implementation perspective. It might over-complicate the mdatagen logic.

dmitryax avatar Feb 15 '24 19:02 dmitryax

I really want this feature.

The glob-style matching is really interesting. It would easily let you turn off a set of metrics you don't want, like *__replicas to turn off all replica metrics from the k8sclusterreceiver. As long as we can detect collisions during startup I really like that option.

I think we should be very strict about not introducing regex as any part of this solution now or in the future. I know it is common to take the step from glob to regex eventually, but I don't think that'd be a good idea.

TylerHelmuth avatar Feb 15 '24 21:02 TylerHelmuth

I think we should be very strict about not introducing regex as any part of this solution now or in the future. I know it is common to take the step from glob to regex eventually, but I don't think that'd be a good idea.

Right, I was thinking we can support * characters only, nothing else. But since other glob characters like ? [] are also invalid in OTel naming conventions, maybe we can just adopt glob as is. Definitely never regex.

dmitryax avatar Feb 15 '24 21:02 dmitryax

Supporting full glob is probably ok.

TylerHelmuth avatar Feb 15 '24 21:02 TylerHelmuth

If we used globs, I think we need more rules for how clashes interact. It feels like wanting to do something like

metrics:
  '*':
     enabled: false
  "k8s.pod.*":
    enabled: true

could arise to turn off all metrics except pod metrics

TylerHelmuth avatar Feb 15 '24 21:02 TylerHelmuth

I opened a related PR here which I was hoping I could leverage in the processesscraper deprecation work: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/30995

I would be in favour of ditching this and going for the glob style instead. I think that would be very powerful. I'd be happy to take on the implementation.

Unless there exists a Go library for it that I'm not aware of, I'd implement glob matching with *, ?, and [] and then implement detecting globs and matching against metric names when accepting a metric builder config.

For clashes, I would need to look at this again but when the metrics builder config is passed to, is the order as written in the config file maintained? Because in that case I think it would make the most sense to make it so globs are matched in the order they appear in the config.

braydonk avatar Feb 16 '24 13:02 braydonk

For clashes, I would need to look at this again but when the metrics builder config is passed to, is the order as written in the config file maintained? Because in that case I think it would make the most sense to make it so globs are matched in the order they appear in the config.

@braydonk It's not maintained. It's map

dmitryax avatar Feb 16 '24 18:02 dmitryax

If we used globs, I think we need more rules for how clashes interact. It feels like wanting to do something like

Do you mean we define that enabled: true takes precedence over enabled: false? What about other configuration options that we can add in the future? Like applying some aggregation. It's problematic even for enabled: if we allow clashes, I'd see users trying something like

metrics:
  '*':
     enabled: true
  "k8s.pod.*":
    enabled: false

which won't work for them in that case.

Trying to be smart and set precedence based on the size of the matching metrics set seems to be an overkill. I think we should start with not allowing clashes.

dmitryax avatar Feb 16 '24 18:02 dmitryax

Not allowing clashes at the start would be really nice, but I think will take away a lot of the use cases. The implementation will help users who want to turn on all their metrics, but not users who want to turn off most of the metrics. Maybe that is ok?

TylerHelmuth avatar Feb 16 '24 19:02 TylerHelmuth

The implementation will help users who want to turn on all their metrics, but not users who want to turn off most of the metrics.

Why not? You just disable all of them with * and enable some of them explicitly by names

dmitryax avatar Feb 16 '24 19:02 dmitryax

Another option is to allow clashes but specify priority explicitly as an additional field e.g.:

metrics:
  '*':
    matching_order: 1
    enabled: true
  "k8s.pod.*":
    matching_order: 2
    enabled: false

Seems a bit ugly to me

dmitryax avatar Feb 16 '24 19:02 dmitryax

You just disable all of them with * and enable some of them explicitly by names

Oh I was under the impression that would be considered a clash. If that is allowed we're good - no clash support needed.

TylerHelmuth avatar Feb 16 '24 19:02 TylerHelmuth

Pinging code owners for cmd/mdatagen: @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] avatar Feb 16 '24 19:02 github-actions[bot]

@braydonk will you be able to take this?

dmitryax avatar Feb 26 '24 17:02 dmitryax

Yes I will take it.

braydonk avatar Feb 27 '24 14:02 braydonk

I have the wildcard matching working. I implemented two types of matching:

* - 0 to n of any character {a,b,...} - A collection of multiple match options

I can implement any other types of wildcards if it would make sense, but every use case I can think of only really benefits from these two options.

The matching also has a priority system where the earlier wildcards are applied first. This way the scenario where we see something like:

'*':
  enabled: false
'process.memory.*':
  enabled: true

This will work in a deterministic and (I think) sensible order; all metrics will be disabled, then all process.memory metrics will be enabled.
I'm working on writing a proper specification for the format and priority orders to iron out the edge cases.

The next step is to determine how this code is included in a generated metadata package. I see two reasonable possibilities:

  1. The wildcard matching code is included directly as another .go.tmpl file (or in an existing one) and it's used locally within the generated package.
  2. A go module is added somewhere under go.opentelemetry.io and the package is imported by generated code.

Option 1 is definitely easier to implement, but working with the code in template files might be frustrating for future maintenance. But this also probably doesn't deserve to be a new module because it seems pretty specific to mdatagen.

I'm going to start by implementing option 1 in a branch so I can put something up, but want to leave the possibility of option 2 open for now.

braydonk avatar Mar 19 '24 14:03 braydonk

I have opened https://github.com/open-telemetry/opentelemetry-collector/pull/10065

I ended up going with option 1 from my comment above for the PR.

P.S. This issue should probably be moved to opentelemetry-collector since this component has moved there now.

braydonk avatar May 01 '24 20:05 braydonk

@braydonk, I don't think we can rely on the order of the map keys because it's not supposed to be preserved

dmitryax avatar May 04 '24 16:05 dmitryax

The PR I posted takes that into account and works regardless of the order. There's a custom priority system. The priority of the pattern goes down for every . in the name, so * will be viewed as higher priority than x.* regardless of what order the patterns are read.

braydonk avatar May 04 '24 16:05 braydonk

How does it work with the {...}? Which one wins compared to *? What if I have prefix.* and *.suffix?

dmitryax avatar May 04 '24 16:05 dmitryax

I have written a full specification at https://gist.github.com/braydonk/ccb6775331fdd5dca91a650330b9839f.

braydonk avatar May 28 '24 14:05 braydonk

Thank you for putting the doc! Once we have the PR merged, please submit a PR to add this in cmd/mdatagen

dmitryax avatar Jun 13 '24 17:06 dmitryax