vector icon indicating copy to clipboard operation
vector copied to clipboard

enhancement(tag_cardinality_limit): Add metric and tag name to tag_value_limit_exceeded_total metric

Open kaarolch opened this issue 1 month ago • 5 comments

Summary

When using the tag_cardinality_limit transform, it's difficult to identify which specific metrics and tag keys are hitting the configured value limit. The tag_value_limit_exceeded_total metric only provides a count of exceeded events without context about which metric or tag was blocked, making it challenging to debug and monitor cardinality issues. More in #20084

This PR adds metric_name and tag_key labels to the tag_value_limit_exceeded_total metric and allowing:

  • Identify which specific metrics are hitting the limit
  • Identify which tag keys are causing the limit to be exceeded
  • Create more targeted alerts and dashboards

Vector configuration

sources:
  vector_metrics:
    type: "internal_metrics" # required
    scrape_interval_secs: 10 
  statsd:
    type: statsd
    address: "0.0.0.0:8128"
    mode: "udp"

transforms:
  metrics_cardinality_limit:
    type: tag_cardinality_limit
    inputs:
      - statsd
    limit_exceeded_action: drop_tag
    internal_metrics:
      include_key_in_limit_metric: true
    mode: exact
    value_limit: 10
sinks:
  drop_bh:
    type: blackhole
    inputs:
      - metrics_cardinality_limit
    print_interval_secs: 0
    buffer:
      type: memory
      when_full: drop_newest
      max_events: 100
  console_debug:
    type: console
    inputs:
      -  vector_metrics
    encoding:
      codec: json
    buffer:
       max_events: 10000
       type: memory
       when_full: drop_newest

How did you test this PR?

Build locally based on docs/DEVELOPING.md and start with config mentioned in the previous section :

target/debug/vector --config ../vector.yaml

Result:

{"name":"tag_value_limit_exceeded_total","namespace":"vector","tags":{"component_id":"metrics_cardinality_limit","component_kind":"transform","component_type":"tag_cardinality_limit","host":"xxxx","metric_name":"high_cardinality_metric_gauge_1","tag_key":"pod_name"},"timestamp":"2025-11-13T20:10:30.055509Z","kind":"absolute","counter":{"value":7470.0}}

When internal_metrics.include_key_in_limit_metric: false:

{"name":"tag_value_limit_exceeded_total","namespace":"vector","tags":{"component_id":"metrics_cardinality_limit","component_kind":"transform","component_type":"tag_cardinality_limit","host":"xxxx"},"timestamp":"2025-11-19T09:19:20.321422Z","kind":"absolute","counter":{"value":2430.0}}

Change Type

  • [ ] Bug fix
  • [ ] New feature
  • [x] Non-functional (chore, refactoring, docs)
  • [ ] Performance

Is this a breaking change?

  • [ ] Yes
  • [x] No

Does this PR include user facing changes?

  • [x] Yes. Please add a changelog fragment based on our guidelines.
  • [ ] No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please run make build-licenses to regenerate the license inventory and commit the changes (if any). More details here.

kaarolch avatar Nov 13 '25 19:11 kaarolch