[feature] Introduce metrics aggregation by labelNames

Open tjiuming opened this issue 3 years ago • 1 comments

Currently, in Prometheus client, we don't have metrics aggregations, the exposed metrics data is the origin data. For a example:

Counter c = Counter.build("metrics_name", "help").labelNames("cluster", "namespace", "topic").create();
c. labels("a1", "b1", "c1").inc();
c. labels("a1", "b1", "c2").inc();
c. labels("a1", "b2", "c3").inc();
c. labels("a1", "b2", "c3").inc();

the exposed metrics as below:

metrics_name_total{cluster="a1", namespace="b1", topic="c1"} 1
metrics_name_total{cluster="a1", namespace="b1", topic="c2"} 1
metrics_name_total{cluster="a1", namespace="b2", topic="c3"} 1
metrics_name_total{cluster="a1", namespace="b2", topic="c4"} 1

But in some conditions, we want to expose the metrics in custom levels. Say, expose metrics data in cluster level as below:

metrics_name_total{cluster="a1"} 4

or in [cluster, namespace] level as below:

metrics_name_total{cluster="a1", namespace="b1"} 2
metrics_name_total{cluster="a1", namespace="b2"} 2

If this request is feasible, it will greatly reduce the pressure on the Prometheus Server side. And also benefits to the Client side, because it can reduce the size of the response body.

Implementation

Aggregator

We can introduce 2 aggregators, SUM and AVG. For COUNTER/GAUGE/HISTOGRAM, we can apply the SUM aggregator to them, for SUMMARY, we can apply AVG and SUM aggregators to it.

Gauge:

Gauge g = Gauge.build("metrics_name", "help").labelNames("cluster", "namespace", "topic").create();
g. labels("a1", "b1", "c1").inc();
g. labels("a1", "b1", "c2").inc();
g. labels("a1", "b2", "c3").inc();
g. labels("a1", "b2", "c3").inc();

The origin data:

metrics_name{cluster="a1", namespace="b1", topic="c1"} 1
metrics_name{cluster="a1", namespace="b1", topic="c2"} 1
metrics_name{cluster="a1", namespace="b2", topic="c3"} 1
metrics_name{cluster="a1", namespace="b2", topic="c4"} 1

Aggregate in cluster level:

metrics_name{cluster="a1"} 4

Aggregate in [cluster, namespace] level:

metrics_name{cluster="a1", namespace="b1"} 2
metrics_name{cluster="a1", namespace="b2"} 2

Counter:

Counter c = Counter.build("metrics_name", "help").labelNames("cluster", "namespace", "topic").create();
c. labels("a1", "b1", "c1").inc();
c. labels("a1", "b1", "c2").inc();
c. labels("a1", "b2", "c3").inc();
c. labels("a1", "b2", "c3").inc();

The origin data:

metrics_name_total{cluster="a1", namespace="b1", topic="c1"} 1
metrics_name_total{cluster="a1", namespace="b1", topic="c2"} 1
metrics_name_total{cluster="a1", namespace="b2", topic="c3"} 1
metrics_name_total{cluster="a1", namespace="b2", topic="c4"} 1

Aggregate in cluster level:

metrics_name_total{cluster="a1"} 4

Aggregate in [cluster, namespace] level:

metrics_name_total{cluster="a1", namespace="b1"} 2
metrics_name_total{cluster="a1", namespace="b2"} 2

Histogram:

Histogram h = Histogram.build("metrics_name", "help").buckets(100, 200, 500).create();
h.labels("a1", "b1", "c1").observe(50);
h.labels("a1", "b1", "c1").observe(150);
h.labels("a1", "b1", "c1").observe(400);
h.labels("a1", "b1", "c2").observe(50);
h.labels("a1", "b1", "c2").observe(150);
h.labels("a1", "b1", "c2").observe(400);
h.labels("a1", "b2", "c3").observe(50);
h.labels("a1", "b2", "c3").observe(150);
h.labels("a1", "b2", "c3").observe(400);
h.labels("a1", "b2", "c4").observe(50);
h.labels("a1", "b2", "c4").observe(150);
h.labels("a1", "b2", "c4").observe(400);

The origin data:

metrics_name_bucket{cluster="a1",namespace="b1",topic="c1",le="100.0",} 1.0
metrics_name_bucket{cluster="a1",namespace="b1",topic="c1",le="200.0",} 2.0
metrics_name_bucket{cluster="a1",namespace="b1",topic="c1",le="500.0",} 3.0
metrics_name_bucket{cluster="a1",namespace="b1",topic="c1",le="+Inf",} 3.0
metrics_name_count{cluster="a1",namespace="b1",topic="c1",} 3.0
metrics_name_sum{cluster="a1",namespace="b1",topic="c1",} 600.0

metrics_name_bucket{cluster="a1",namespace="b1",topic="c2",le="100.0",} 1.0
metrics_name_bucket{cluster="a1",namespace="b1",topic="c2",le="200.0",} 2.0
metrics_name_bucket{cluster="a1",namespace="b1",topic="c2",le="500.0",} 3.0
metrics_name_bucket{cluster="a1",namespace="b1",topic="c2",le="+Inf",} 3.0
metrics_name_count{cluster="a1",namespace="b1",topic="c2",} 3.0
metrics_name_sum{cluster="a1",namespace="b1",topic="c2",} 600.0

metrics_name_bucket{cluster="a1",namespace="b2",topic="c3",le="100.0",} 1.0
metrics_name_bucket{cluster="a1",namespace="b2",topic="c3",le="200.0",} 2.0
metrics_name_bucket{cluster="a1",namespace="b2",topic="c3",le="500.0",} 3.0
metrics_name_bucket{cluster="a1",namespace="b2",topic="c3",le="+Inf",} 3.0
metrics_name_count{cluster="a1",namespace="b2",topic="c3",} 3.0
metrics_name_sum{cluster="a1",namespace="b2",topic="c3",} 600.0

metrics_name_bucket{cluster="a1",namespace="b2",topic="c4",le="100.0",} 1.0
metrics_name_bucket{cluster="a1",namespace="b2",topic="c4",le="200.0",} 2.0
metrics_name_bucket{cluster="a1",namespace="b2",topic="c4",le="500.0",} 3.0
metrics_name_bucket{cluster="a1",namespace="b2",topic="c4",le="+Inf",} 3.0
metrics_name_count{cluster="a1",namespace="b2",topic="c4",} 3.0
metrics_name_sum{cluster="a1",namespace="b2",topic="c4",} 600.0

Aggregate in cluster level:

metrics_name_bucket{cluster="a1",le="100.0",} 4.0
metrics_name_bucket{cluster="a1",le="200.0",} 8.0
metrics_name_bucket{cluster="a1",le="500.0",} 12.0
metrics_name_bucket{cluster="a1",le="+Inf",} 12.0
metrics_name_count{cluster="a1",} 12.0
metrics_name_sum{cluster="a1",} 2400.0

Aggregate in [cluster, namespace] level:

metrics_name_bucket{cluster="a1",namespace="b2",le="100.0",} 2.0
metrics_name_bucket{cluster="a1",namespace="b2",le="200.0",} 4.0
metrics_name_bucket{cluster="a1",namespace="b2",le="500.0",} 6.0
metrics_name_bucket{cluster="a1",namespace="b2",le="+Inf",} 6.0
metrics_name_count{cluster="a1",namespace="b2",} 6.0
metrics_name_sum{cluster="a1",namespace="b2",} 1200.0

metrics_name_bucket{cluster="a1",namespace="b1",le="100.0",} 2.0
metrics_name_bucket{cluster="a1",namespace="b1",le="200.0",} 4.0
metrics_name_bucket{cluster="a1",namespace="b1",le="500.0",} 6.0
metrics_name_bucket{cluster="a1",namespace="b1",le="+Inf",} 6.0
metrics_name_count{cluster="a1",namespace="b1",} 6.0
metrics_name_sum{cluster="a1",namespace="b1",} 1200.0

Summary

Unlike the above meters, SUMMARY is special. For metrics_name_count and metrics_name_sum, we have to use the SUM aggregator. But for the timeseries with quantile label, I think AVG aggregator is the best choice.

[x] I'm willing to submit the PR

Dec 08 '22 09:12 tjiuming

@tjiuming For the group by label name / Counter scenario, you can write your own Collector and register it. This should also work for other types of aggregation/summation.

Example JUnit test / CounterGroupByCollector

package io.prometheus.client;

import org.junit.After;
import org.junit.Before;
import org.junit.Test;

import java.util.ArrayList;
import java.util.List;

public class CounterGroupByCollectorTest {

  CollectorRegistry registry;
  Counter counter;
  CounterGroupByCollector counterGroupByCollector;

  @Before
  public void setUp() {
    registry = new CollectorRegistry();
    counter = Counter.build("metrics_name", "metrics_name help").labelNames("cluster", "namespace", "topic").create();
    counterGroupByCollector = new CounterGroupByCollector(counter);
  }

  @Test
  public void test() {
    counter.labels("a1", "b1", "c1").inc();
    counter.labels("a1", "b1", "c2").inc();
    counter.labels("a1", "b2", "c3").inc();
    counter.labels("a1", "b2", "c4").inc();

    System.out.println("Group by \"cluster\"...");
    counterGroupByCollector.groupBy("cluster");

    List<Collector.MetricFamilySamples> mfs = counterGroupByCollector.collect();
    for (Collector.MetricFamilySamples samples : mfs) {
      for (Collector.MetricFamilySamples.Sample sample : samples.samples) {
        System.out.println(String.format("sample [%s]", sample));
      }
    }

    System.out.println("---");
    System.out.println("No group by...");
    counterGroupByCollector.groupBy(null);

    mfs = counterGroupByCollector.collect();
    for (Collector.MetricFamilySamples samples : mfs) {
      for (Collector.MetricFamilySamples.Sample sample : samples.samples) {
        System.out.println(String.format("sample [%s]", sample));
      }
    }

    System.out.println("---");
    System.out.println("Group by \"cluster\", \"namespace\"...");
    counterGroupByCollector.groupBy("cluster", "namespace");

    mfs = counterGroupByCollector.collect();
    for (Collector.MetricFamilySamples samples : mfs) {
      for (Collector.MetricFamilySamples.Sample sample : samples.samples) {
        System.out.println(String.format("sample [%s]", sample));
      }
    }

    System.out.println("---");
    System.out.println("Group by \"cluster\", \"namespace\", \"topic\"...");
    counterGroupByCollector.groupBy("cluster", "namespace", "topic");

    mfs = counterGroupByCollector.collect();
    for (Collector.MetricFamilySamples samples : mfs) {
      for (Collector.MetricFamilySamples.Sample sample : samples.samples) {
        System.out.println(String.format("sample [%s]", sample));
      }
    }
  }

  public static class CounterGroupByCollector extends Collector {

    private Counter counter;
    private String[] groupByLabelNames;

    public CounterGroupByCollector(Counter counter) {
      this.counter = counter;
    }

    public void groupBy(String ... labelNames) {
      if ((labelNames == null) || (labelNames.length == 0)) {
        synchronized (this) {
          groupByLabelNames = null;
        }

        return;
      }

      if (labelNames.length > counter.labelNames.size()) {
        throw new IllegalArgumentException("Group by labels name contains more labels than Counter");
      }

      List<String> labelNameList = toList(labelNames);
      List<String> counterLabelNameList = counter.labelNames;

      for (int i = 0; i < labelNameList.size(); i++) {
        if (!labelNameList.get(i).equals(counterLabelNameList.get(i))) {
          throw new IllegalArgumentException("Group by labels names are not a subset of Counter label names");
        }
      }

      synchronized (this) {
        this.groupByLabelNames = labelNames;
      }
    }

    @Override
    public List<MetricFamilySamples> collect() {
      String[] localGroupByLabelNames;
      synchronized (this) {
        localGroupByLabelNames = groupByLabelNames;
      }

      if (localGroupByLabelNames == null) {
        return counter.collect();
      }

      Counter localCounter =
              Counter
                      .build("metrics_name", "metrics_name help")
                      .labelNames(localGroupByLabelNames).create();

      List<Collector.MetricFamilySamples> mfs = counter.collect();
      for (Collector.MetricFamilySamples samples : mfs) {
        for (Collector.MetricFamilySamples.Sample sample : samples.samples) {
          if (sample.name.endsWith("_total")) {
            String[] labelValues = sample.labelValues.subList(0, localGroupByLabelNames.length).toArray(new String[localGroupByLabelNames.length]);
            localCounter.labels(labelValues).inc(sample.value);
          }
        }
      }

      return localCounter.collect();
    }
  }

  private static List<String> toList(String ... values) {
    List<String> list = new ArrayList<String>(values.length);
    for (String value : values) {
      list.add(value);
    }
    return list;
  }
}

Feb 12 '23 05:02 dhoard