seastar metrics: Add update_aggregate

Extends the metrics api to allow changing the aggregation labels of a metrics family.

Otherwise one had to un-register every single metric instance in a metric family and then re-register with the changed aggregation labels.

For metric families with thousands of instances (e.g.: histograms with lots of different labels) this is quite expensive.

With this change we avoid the full reconstruction of the metrics family and all its metrics. Only the work associated with marking the metrics dirty() is needed then.

Nov 24 '23 12:11 StephanDollberg

First some of the tests are failing, second, marking the metric with dirty will in fact recreate all the metadata, it's not saving anything, dirty happens on the next call to get the metrics values.

The idea with dirty is that on startup we usually register lots of metrics, to avoid recreating the meta data object on each metric registration (a quadratic behavior) we delay it.

If done properly, changing the aggregation label could avoid the dirty all together.

What is the use case of changing the aggregation labels in the first place?

Nov 28 '23 07:11 amnonh

One more thing, if there is a use case to change the aggregate labels (I would like to see), it would be best if we can add it the relabel_config

Nov 28 '23 07:11 amnonh

First some of the tests are failing, second, marking the metric with dirty will in fact recreate all the metadata, it's not saving anything, dirty happens on the next call to get the metrics values.

The failing tests are https://github.com/scylladb/seastar/issues/1913

The idea with dirty is that on startup we usually register lots of metrics, to avoid recreating the meta data object on each metric registration (a quadratic behavior) we delay it.

Understand, maybe I am missing your point but we will have to recreate it once the aggregation labels are changed?

If done properly, changing the aggregation label could avoid the dirty all together.

I don't immediately see how that would work? You mean iterating over the queue instead and updating in place or something? I guess that would be a slight perf improvement.

What is the use case of changing the aggregation labels in the first place?

The tradeof with aggregation is that you lose information at the cost having less metrics series which reduces load/costs on your metrics infra.

In redpanda we generally have two modes, one that adds lots of detailed labels and gives you the full info while the second aggregates lots of those labels away as metrics cardinality would be way too high otherwise which would make metrics infra costs sky rocket.

The former is generally fine but during incidents having the full info is useful. Hence this change to cheaply update the aggregation labels.

One more thing, if there is a use case to change the aggregate labels (I would like to see), it would be best if we can add it the relabel_config

I don't immediately see how this would be beneficial. The whole label relabing part is currently unrelated to the aggregation labels. They only come in at scrape time. Hence the API as added in this PR is a lot simpler.

Nov 28 '23 10:11 StephanDollberg

First some of the tests are failing, second, marking the metric with dirty will in fact recreate all the metadata, it's not saving anything, dirty happens on the next call to get the metrics values.

The failing tests are #1913

The idea with dirty is that on startup we usually register lots of metrics, to avoid recreating the meta data object on each metric registration (a quadratic behavior) we delay it.

Understand, maybe I am missing your point but we will have to recreate it once the aggregation labels are changed?

I'm saying that if you call dirty() the entire meta data will need to be re-created.

If done properly, changing the aggregation label could avoid the dirty all together.

I don't immediately see how that would work? You mean iterating over the queue instead and updating in place or something? I guess that would be a slight perf improvement.

I'm saying that aggregation label are only applicable on reporting, so you don't need to rebuild the meta-data

What is the use case of changing the aggregation labels in the first place?

The tradeof with aggregation is that you lose information at the cost having less metrics series which reduces load/costs on your metrics infra.

In redpanda we generally have two modes, one that adds lots of detailed labels and gives you the full info while the second aggregates lots of those labels away as metrics cardinality would be way too high otherwise which would make metrics infra costs sky rocket.

The former is generally fine but during incidents having the full info is useful. Hence this change to cheaply update the aggregation labels.

How do you see that it's going to work? In practice, in run-time, how are you going to change that?

One more thing, if there is a use case to change the aggregate labels (I would like to see), it would be best if we can add it the relabel_config

I don't immediately see how this would be beneficial. The whole label relabing part is currently unrelated to the aggregation labels. They only come in at scrape time. Hence the API as added in this PR is a lot simpler.

It might be simpler, but you just add an API to the metric layer that is disconnected from the API we already have and it's unclear to me how it's going to be used.

Nov 28 '23 10:11 amnonh

I'm saying that if you call dirty() the entire meta data will need to be re-created.

I'm saying that aggregation label are only applicable on reporting, so you don't need to rebuild the meta-data

The metadata is used when reporting though so if metadata is not rebuilt then it will use some outdated aggregation labels? This is also something I was running into initially when I was not setting the dirty flag.

How do you see that it's going to work? In practice, in run-time, how are you going to change that?

So how we use this in redpanda is that we have registry for each metric which tracks the aggregation labels for either state of our "aggregate metrics" config flag being true or false. When the config flag is changed at runtime we call update_aggregate_labels for each metric family with the aggregation labels that we then want to use.

Previously the config flag was just checked on startup/metric creation time.

You can see the specific code here: https://github.com/redpanda-data/redpanda/blob/dev/src/v/metrics/metrics_registry.cc#L55-L65

It might be simpler, but you just add an API to the metric layer that is disconnected from the API we already have and it's unclear to me how it's going to be used.

I really don't see how the relable_config API is related. set_relabel_configs applies the relabel config to all metric families and metric series. This can be useful if one wants to change a label on all series (for example shard) but not if wanting to change the aggregation label for one specific metric family.

While one could certainly shoehorn this in the relabel config API it would be inefficient at that. It has something like O(S*R) complexity (S == metric series, R == relable configs) with very high per OP cost because of the regex match. Further it would require a separate relabel config for each metric family. The new API is O(1) as it really just concerns itself with a single metric family.

Conceptually I also think of relabel config happening on a layer above metric aggregation.

Nov 28 '23 15:11 StephanDollberg

The metadata is used when reporting though so if metadata is not rebuilt then it will use some outdated aggregation labels? This is also something I was running into initially when I was not setting the dirty flag.

All metrics with their data are stored in _value_map, a subset of the metrics (those that are enabled) are stored in _metadata which is updated when you set _dirty to true.

Changing the aggregation labels, does not require to change the structure of the _metadata you can update the labels directly and save the rebuild of the structure (which is expensive).

Another alternative (which is probably even better) is to remove the duplication and have _metadata hold a shard_ptr pointing to the _value_map, it would cost another redirection but it would save memory and time when creating the copy.

How do you see that it's going to work? In practice, in run-time, how are you going to change that?

So how we use this in redpanda is that we have registry for each metric which tracks the aggregation labels for either state of our "aggregate metrics" config flag being true or false. When the config flag is changed at runtime we call update_aggregate_labels for each metric family with the aggregation labels that we then want to use.

Previously the config flag was just checked on startup/metric creation time.

You can see the specific code here: https://github.com/redpanda-data/redpanda/blob/dev/src/v/metrics/metrics_registry.cc#L55-L65

How do you make sure that it runs over all shards? Your implementation is very specific for a single metric, this functionality will be useful for multiple metrics, for example, you can use regular expression or prefix matching for the full metric name.

Nov 29 '23 11:11 amnonh

Looks reasonable to me, but will wait for @amnonh to approve.

Nov 29 '23 18:11 avikivity

You can rebase to clear the CI failures.

Nov 29 '23 18:11 avikivity

Changing the aggregation labels, does not require to change the structure of the _metadata you can update the labels directly and save the rebuild of the structure (which is expensive).

Another alternative (which is probably even better) is to remove the duplication and have _metadata hold a shard_ptr pointing to the _value_map, it would cost another redirection but it would save memory and time when creating the copy.

Ah I think I was misunderstanding you here. So what you are saying is that we shouldn't update the whole of _metadata (as is done by marking dirty==true) but just do a targeted update of the aggregation labels?

This of course makes sense and I will have a look at some of your suggestions.

How do you make sure that it runs over all shards?

Each shard has to run it locally (similar to how each shard registers metrics separately).

Your implementation is very specific for a single metric, this functionality will be useful for multiple metrics, for example, you can use regular expression or prefix matching for the full metric name.

Yes this is intentional though. Each metric family has its own aggregation labels so changing multiple at the same time doesn't really make sense.

I can certainly see where something like the relabel_config approach is useful but the API as added really just serves as a counterpart to updating some of the fields that were previously specified when registering the metrics.

Nov 29 '23 18:11 StephanDollberg

Had a look at partially updating _metadata.

It's actually a bit more tricky than I thought as the whole thing is shared across shards via a shared_ptr so we can't easily change anything in there without extra protection.

We could simplify update_metrics_if_needed to only do "half" of the update (i.e.: only update _metadata and not _current_metrics) but I don't actually think this will save much as we would still need to rebuild lots of the structures. So probably not worth the effort.

Possibly we could do something a bit more tricky like doing a best effort update by checking _metadata.use_count() in update_metrics_if_needed and then update the individual members in _metadata directly if needed. I guess in practice this should work out fine given how the call chains work and users of get_values will like have dropped their shared_ptr before calling get_values the next time (which will then trigger the update). I can get up a patch like that if you think it's worth persuing.

Dec 01 '23 11:12 StephanDollberg

seastar
seastar copied to clipboard

metrics: Add update_aggregate_labels()

seastar seastar copied to clipboard

metrics: Add update_aggregate_labels()

seastar
seastar copied to clipboard