observability icon indicating copy to clipboard operation
observability copied to clipboard

producer/consumer metrics using redpanda_kafka_request_bytes_total are wrong

Open hcoyote opened this issue 1 year ago • 2 comments

This metric incorrectly calculates the usage when a learner event is happening (decom, node add, etc). we should be using these instead for determining on-the-wire traffic for the cluster for produce/consume side throughput.

The metric should update to:

Producer traffic:

sum(rate(redpanda_rpc_received_bytes{redpanda_server="kafka", redpanda_id="$redpanda_id"}[5m])) by (cluster)

Consumer traffic:

sum(rate(redpanda_rpc_sent_bytes{redpanda_server="kafka", redpanda_id="$redpanda_id"}[5m])) by (cluster)

Adjust the labels accordingly to fit the observability repo dashboards.

hcoyote avatar Sep 26 '24 19:09 hcoyote

@hcoyote - does this fix sharechat issue of them seeing replication traffic on the consumer side ?

bpraseed avatar Oct 09 '24 19:10 bpraseed

Neither redpanda_rpc_received_bytes nor redpanda_rpc_sent_bytes includes topic-level detail. In contrast, redpanda_kafka_request_bytes_total does provide topic-level detail (using the redpanda_topic label).

Whether or not that matters is down to the use case. In this example, I wouldn't say we can use one in place of the other.

pmw-rp avatar Nov 08 '24 09:11 pmw-rp