pulsar [fix][broker][functions-worker] Ensure prometheus metrics are grouped by type (#8407, #13865)

trafficstars

Fixes #8407 Fixes #13865

Motivation

Current broker prometheus metrics are not grouped by metric type which causes issues in systems that read these metrics (e.g. DataDog).

Prometheus docs states "All lines for a given metric must be provided as one single group" - https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md#grouping-and-sorting

Modifications Updated the namespace and topic prometheus metric generators to group the metrics under the appropriate type header. Updated function worker stats to include TYPE headers

Verifying this change

[x] Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

Added unit test to verify all metrics are grouped under correct type header

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (no)
The public API: (no)
The schema: (no)
The default values of configurations: (no)
The wire protocol: (no)
The rest endpoints: (no)
The admin cli options: (no)
Anything that affects deployment: (no)

Documentation

Need to update docs?

[X] doc-not-needed Changes to match prometheus spec

May 12 '22 08:05 marksilcox

@codelipenghui can you look at this?

May 12 '22 08:05 marksilcox

@eolivelli could you please approve the workflows and review again

May 17 '22 14:05 marksilcox

/pulsarbot run-failure-checks

May 18 '22 08:05 marksilcox

/pulsarbot run-failure-checks

May 20 '22 08:05 marksilcox

/pulsarbot run cpp-tests

May 25 '22 11:05 marksilcox

@asafm I thought of another way of grouping these metrics, how about write metrics-data into a same ByteBuf which has the same metric name? In the finally, merge these ByteBufs into stream, seems it won't takes too much costs.

Besides, over all looks good to me.

Jun 21 '22 12:06 tjiuming

@tjiuming Maybe you meant @marksilcox ?

I have a note regarding the way we write NamespaceStatsAggregator. The whole idea in the way the metric system is built today in Pulsar (I know it's fractured) is to try to avoid memory allocation as much as possible. It does so by:

Using long and AtomicLongFieldUpdater instead of AtomicLong
Writing the values directly into ByteBuf and then to the stream.
In some cases directly into the stream

In your refactor, we add a Sample object each containing 2 lists and a value (Double). On the way, we create a Map, and of course a MetricFamilySamples object. The sample is done per metrics we have. I'm just starting to learn the metrics code in Pulsar, but I think it beats the point of the original code. I believe @merlimat can correct me.

What I suggest is to build the code to achieve the goal you desire, each sample belonging to a specific metric will be written one after another.

How?

Let's collect all TopicStats. We pay the cost of allocating them anyway. Once we have that, we can write a function writing each individual metric samples. Each will call a function that looks like: writeMetric(stream, "msgsRate", cluster, namespace, topicToStatsMap, topicStats -> topicStats.msgsRate)

WDYT?

Jun 21 '22 15:06 asafm

@asafm

I may be missing something, but to be able to print the metrics in the Prometheus format we would still need to collate all TopicStats and all AggregatedNamespaceStats as they share metric types (e.g. there is a pulsar_rate_in metric for both the namespace and all topics in that namespace). Then aggregate across all by metric name.

There are usages elsewhere in Pulsar of MetricFamilySamples, which is why I went down this route rather than creating a new model to hold these relationships.

Jun 21 '22 16:06 marksilcox

NamespaceStats and TopicStat do share metrics as they use the same metrics but aggregate to different level. One is raw, at topic level, and the other is simple aggregation to namespace level. The includeTopics flag is actually an either: includeTopics actually means, print only topic level metrics, while false means print only namespace level metrics. So there's no collision then no?

Jun 21 '22 18:06 asafm

I definitely agree that for the developer you'll be spending more time writing this and checking for collisions, and maybe printing from one class one variable and then another class same variable, but you will obtain the end result of not increasing memory allocations. The more I read into Pulsar metrics code, the more I see techniques used to decrease memory allocations. Just today I saw in TransactionsAggregator that they avoid allocating objects and keeping the object in ThreadLocal and reset it every time they need it, like object pool, but with a fixed size (number of threads).

I guess @merlimat and @codelipenghui can contribute their knowledge here on how important that is, as I'm fairly new to this project.

Jun 21 '22 18:06 asafm

@marksilcox Please provide a correct documentation label for your PR. Instructions see Pulsar Documentation Label Guide.

Jun 22 '22 07:06 github-actions[bot]

The problem we have is that we need to collect by metric name, and not by topic/namespace as is currently happening. So for a metric such as pulsar_rate_in whether includeTopics is enabled or not we need all values of this metric grouped under a single type header.

This means whatever is used needs to store each metric in a way so other namespaces or topics can add there values. I guess what I need to find is the most memory efficient way of managing this.

Jun 22 '22 14:06 marksilcox

The problem we have is that we need to collect by metric name, and not by topic/namespace as is currently happening. So for a metric such as pulsar_rate_in whether includeTopics is enabled or not we need all values of this metric grouped under a single type header.

This means whatever is used needs to store each metric in a way so other namespaces or topics can add there values. I guess what I need to find is the most memory efficient way of managing this.

I believe you can stick to the same data structures used now. Just when printing it, you iterate over the different data structures. For example, pulsa_rate_in, you iterate over the the TopicStats, grab the value from there and print it. Do that for every metric.

Jun 22 '22 15:06 asafm

@asafm I have now updated following the pattern you suggest. I'm not totally familiar with FastThreadLocal but I think I have minimised the object creation where possible. Would be grateful for some further pointers if this could be improved

Jun 23 '22 16:06 marksilcox

/pulsarbot run-failure-checks

Jun 24 '22 10:06 marksilcox

@asafm That makes sense, I hadn't thought about the impact with that volume of topics (we currently have relatively few in comparison). I will look at using ByteBuf as suggested.

Jul 01 '22 08:07 marksilcox

/pulsarbot run-failure-checks

Jul 05 '22 07:07 marksilcox

@merlimat Sorry to nag again - could you please take a look at the comments I mentioned you at? I want to be certain here.

Jul 17 '22 09:07 asafm

LGTM

Jul 17 '22 16:07 tjiuming

/pulsarbot run-failure-checks

Jul 20 '22 09:07 marksilcox

All looks good on my end :) I'll try to find someone with write permission to wrap this up

Aug 08 '22 09:08 asafm

@eolivelli Do you want to re-review your requested changes?

Aug 08 '22 09:08 asafm

/pulsarbot run-failure-checks

Aug 10 '22 11:08 marksilcox

@marksilcox I'm compiling a document detailing exactly how the metric system works today in Pulsar, and while doing so I've come to understand how Pulsar Function metrics work. The bad news: They also violate the grouping of the metric name. You will only experience it when you scrape Pulsar Function Worker process metrics or if you run it embedded within a Pulsar Broker.

The code is in FunctionsStatsGenerator. So, the function there iterates over the different function runtimes. For example, if this worker runs function A with a replication factor of 2, and mode Process, it will spin up two operating system processes running JavaInstanceStarter which executes the function. Inside it runs a Prometheus HTTPServer which exposes its Prometheus metrics on a defined metrics port. Each process exposes the same metric name, so when FunctionsStatsGenerator aggregates them, it simply dumps one output into the same stream. This will probably require a separate PR with a different fix.

Aug 16 '22 10:08 asafm

@asafm Not sure I will have the bandwidth to look at FunctionsStatsGenerator anytime soon

Aug 16 '22 16:08 marksilcox

@marksilcox Your work was great. We can settle for opening an issue and linking this PR to it - what do you say?

Aug 17 '22 08:08 asafm

Just need to see how to get those tests green. @nlu90 How can we move forward?

Aug 17 '22 08:08 asafm

/pulsar-bot rerun-failure-checks

Aug 17 '22 08:08 eolivelli

there is a checkstyle violation:

[INFO] There is 1 error reported by Checkstyle 8.37 with ../buildtools/src/main/resources/pulsar/checkstyle.xml ruleset. Error: src/main/java/org/apache/pulsar/broker/stats/prometheus/NamespaceStatsAggregator.java:[372] (regexp) RegexpSingleline: Trailing whitespace [INFO] ------------------------------------------------------------------------

Aug 17 '22 08:08 eolivelli

@marksilcox Your work was great. We can settle for opening an issue and linking this PR to it - what do you say?

@asafm there is already an open issue (#8407) for this, or is a new one required?

Aug 21 '22 09:08 marksilcox

pulsar pulsar copied to clipboard

[fix][broker][functions-worker] Ensure prometheus metrics are grouped by type (#8407, #13865)

Motivation

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

pulsar
pulsar copied to clipboard