Burrow
Burrow copied to clipboard
Burrow stops consuming from a single partition after segment roll
Hey,
We're experiencing issues with consuming messages from topic __consumer_offsets. Sometimes when a segment roll for a partition happens on a broker, burrow stops consuming from that partition without any errors/warns/info. Result - burrow shows "fake" lag for consumer groups which data was pushed to this particular partition. So far we experienced it happening only on 2 partitions (12 and 13)
Example:
log from kafka broker:
[2020-07-24 04:40:30,911] INFO [Log partition=__consumer_offsets-13, dir=/data_disk_0/kafka-logs] Rolled new log segment at offset 2375038040 in 2 ms. (kafka.log.Log)
and prometheus showing lag:

We scheduled a cron job every hour to restart burrow for now. That's why lag disappears at 05:00 on the histogram above.
Topic details: replication | 3 partitions | 50 segment.bytes | 104857600 compression.type | producer cleanup.policy | compact
Burrow version 1.3.4 Kafka brokers version 2.1.0
Burrow config:
[storage.inmemory]
class-name="inmemory"
workers=20
[zookeeper]
servers=[ <zk_servers> ]
timeout=6
root-path="/burrow"
[cluster.kafka-p-vm]
class-name="kafka"
servers=[ <kafka_servers> ]
[consumer.kafka-p-vm]
class-name="kafka"
cluster="kafka-p-vm"
servers=[ <kafka_servers> ]
start-latest=true
group-blacklist=""
group-whitelist=""
[httpserver.default]
address=":8000"
[client-profile.default-client]
kafka-version="2.1.0"
Broker config: Kafka broker config
PS. I don't know if it's relevant but those are 2 biggest partitions in __consumer_offsets: 21G /data_disk_0/kafka-logs/__consumer_offsets-13 217M /data_disk_0/kafka-logs/__consumer_offsets-12
with the next biggest one to follow with: 99M /data_disk_0/kafka-logs/__consumer_offsets-47