redpanda
redpanda copied to clipboard
Consumer Group metrics
Who is this for and what problem do they have today?
We do not really have good metrics for Consumer Group basics ... e.g lag, rebalancing/rejoin, heatbeats, latency
Redpanda admins/Support trying to troubleshoot Consumer Group issues
What are the success criteria?
metrics exposed for things like : join-rate* , heartbeats, lag , latencies
Why is solving this problem impactful?
Because it helps us troubleshoot issues with CG currently ... if we want to see some of this stuff , for example "Handling join request/PreparingRebalance" we nave to turn on TRACE,,, and for "kafka" which is very noisy.
Additionally with metrics.. customers can have alerts defined for things such as CG's having high amount of rebalancing
Additional notes
for inspo: Consumer Group Metric