amazon-vpc-cni-k8s icon indicating copy to clipboard operation
amazon-vpc-cni-k8s copied to clipboard

Counters reported as Gauges in Prometheus metrics

Open danielgblanco opened this issue 1 year ago • 4 comments
trafficstars

What happened: Some of the Prometheus metrics exported by the VPC CNI plugin are defined with inaccurate metric types. For example:

https://github.com/aws/amazon-vpc-cni-k8s/blob/27ce1362636567592f006b987f3820c6b0fef55e/utils/prometheusmetrics/prometheusmetrics.go#L64

This metric (awscni_add_ip_req_count) is exported as a gauge but it has cumulative incremental values. In fact, it seems that it's used as a counter in:

https://github.com/aws/amazon-vpc-cni-k8s/blob/27ce1362636567592f006b987f3820c6b0fef55e/pkg/ipamd/rpc_handler.go#L70

It seems that awscni_del_ip_req_count is correctly exported as a counter.

I probably don't have enough context on this to make a judgement call. However, I think there are probably more Gauges that are operating as Counters.

Attach logs N/A

What you expected to happen: I'd expect metrics to follow the semantic conventions defined in https://prometheus.io/docs/concepts/metric_types/

How to reproduce it (as minimally and precisely as possible): Using Prometheus exporters.

Anything else we need to know?: This may not be a critical issues if systems use Prometheus as the backend. However, it becomes a problem when Prometheus metrics are transformed into other representations. For example, OpenTelemetry Collectors will read this as a Gauge and that gives the aggregation a different meaning (e.g. one can change temporality of counters from cumulative to delta or viceversa).

Environment:

  • Kubernetes version (use kubectl version): 1.28.12
  • CNI Version: 1.16.3
  • OS (e.g: cat /etc/os-release): Bottlerocket 1.21.0
  • Kernel (e.g. uname -a): x86_64 GNU/Linux

danielgblanco avatar Sep 10 '24 16:09 danielgblanco