kprom: Multiple clients
Issue
The main goal of this PR is to add the ability for kprom plugin to differentiate between multiple clients.
The current implementation of the WithClientLabel() option in the kprom package assigns metrics to the most recently registered kgo client, which leads to metrics being grouped under a static client_id. The goal is to keep the existing API but make the client_id label dynamic, allowing it to reflect the actual client making the request rather than being statically set at first registration (see twmb#815).
Only a subset of metrics will be affected by this change:
write_bytes_totalwrite_errors_totalwrite_wait_secondswrite_time_secondsread_bytes_totalread_errors_totalread_time_secondsrequest_duration_e2e_secondsrequest_throttled_secondsproduce_compressed_bytes_totalproduce_(uncompressed_)bytes_totalproduce_batches_totalproduce_records_totalfetch_compressed_bytes_totalfetch_(uncompressed_)bytes_totalfetch_batches_totalfetch_records_total
Metrics that will continue to have a constant client_id:
connects_totalconnect_errors_totaldisconnects_totalbuffered_fetch_records_totalbuffered_produce_records_total
Testing
I tested the updated metrics locally using a combination of consumers and producers. When scraping Prometheus, the metrics are now properly separated by the client_id label, even when writing to and reading from the same topic using different client IDs for two producers and consumers:
# HELP test_buffered_fetch_records_total Total number of records buffered within the client ready to be consumed
# TYPE test_buffered_fetch_records_total gauge
test_buffered_fetch_records_total{client_id="primary"} 0
# HELP test_buffered_produce_records_total Total number of records buffered within the client ready to be produced
# TYPE test_buffered_produce_records_total gauge
test_buffered_produce_records_total{client_id="primary"} 0
# HELP test_connects_total Total number of connections opened
# TYPE test_connects_total counter
test_connects_total{client_id="primary",node_id="1"} 8
test_connects_total{client_id="primary",node_id="seed_0"} 4
# HELP test_fetch_bytes_total Total number of uncompressed bytes fetched
# TYPE test_fetch_bytes_total counter
test_fetch_bytes_total{client_id="primary",node_id="1",topic="primary"} 800
test_fetch_bytes_total{client_id="secondary",node_id="1",topic="primary"} 800
# HELP test_produce_bytes_total Total number of uncompressed bytes produced
# TYPE test_produce_bytes_total counter
test_produce_bytes_total{client_id="primary",node_id="1",topic="primary"} 400
test_produce_bytes_total{client_id="secondary",node_id="1",topic="primary"} 400
# HELP test_read_bytes_total Total number of bytes read
# TYPE test_read_bytes_total counter
test_read_bytes_total{client_id="primary",node_id="1"} 14029
test_read_bytes_total{client_id="primary",node_id="seed_0"} 1166
test_read_bytes_total{client_id="secondary",node_id="1"} 13876
test_read_bytes_total{client_id="secondary",node_id="seed_0"} 1166
# HELP test_write_bytes_total Total number of bytes written
# TYPE test_write_bytes_total counter
test_write_bytes_total{client_id="primary",node_id="1"} 10080
test_write_bytes_total{client_id="primary",node_id="seed_0"} 172
test_write_bytes_total{client_id="secondary",node_id="1"} 10072
test_write_bytes_total{client_id="secondary",node_id="seed_0"} 180
Implementation
Since kgo doesn’t export client_id in a way that supports dynamic labeling in hooks, I had to make some modifications to the core library. My goal was to minimize changes to the kgo package itself and to avoid breaking any existing APIs (such as hooks). To that end, I chose not to add dynamic labels to the connection and buffered metrics, keeping those labels static.
I forgot this included changes to kgo when prepping for the release I just cut. I'm aiming to work on the next batch of requests (caching improvements largely) for 1.19 over the next month, but that's going to delay my merge of this. Really sorry about missing that.
In the meantime, my first comment is ClientID is pointer to ID of the client that made the request => these aren't pointers (maybe you started with a pointer); the comment can be touched up. I'll review the rest of this more closely soon.
The main thing I'll want to ensure is that the default metrics do not change for anybody (no breaking change); any changes here will be opt in.
may be create additional method to create hooks with some predefined labels? like NewHook(labels ...string) , so we can create hook that already populate labels for all metrics ? And default behaviour - fill client_id label like now?
Thank you for making a time for this PR!
Sorry for not responding sooner, but to answer your question. I designed this PR to not make any breaking changes to users of this library. So although one uses the WithClientLabel option it only splits the metrics for individual clients as one would expect. No changes to kgo hooks that would disrupt their usage were made, I only used already present metadata structures to pass in the required client ID.
I made four new commits to separate the content for review's sake. I plan to squash these before merging. One small addition that wasn't mentioned in the comment is returning constLabels to connection metrics. I accidentally removed these and forgot to return them back.
@twmb returning back to your review I actually found out, that my testing setup was flawed. The approach where one would initiate the metrics separately per client instead of reusing the same object (while using the kprom.Registerer) would work (example below).
primaryCl := newClient(kgo.SeedBrokers(brokers...), kgo.ClientID("primary"))
secondaryCl := newClient(kgo.SeedBrokers(brokers...), kgo.ClientID("secondary"))
...
func newClient(opts ...kgo.Opt) *kgo.Client {
metrics := kprom.NewMetrics("kafka_stats",
kprom.Registerer(prometheus.DefaultRegisterer),
kprom.WithClientLabel(),
)
opts = append(opts, kgo.WithHooks(metrics))
cl, err := kgo.NewClient(opts...)
if err != nil {
panic(err)
}
return cl
}
Based on this, I suggest closing this PR and related issue. I'm terriblly sorry for wasting your time but I do really appreciate your review and insights, it gave me quite a lot. I hope to contribute in a more meaningful way next time. Thank you!
Per request in the PR, I'm closing this. I'm still open to moving forward with the PR (there were a few outstanding questions) but I'm not sure it's needed given the final comment here?
Yes, the comments are no longer relevant, thank you.