Burrow
Burrow copied to clipboard
Polling the HTTP /lag endpoint causes strange results
This maybe my limited understanding, here, but it's worth me writing this down to see if I'm misguided or have found an issue. My setup is 1 x Burrow instance consuming a single Kafka cluster. I have a polling script that calls the /lag endpoint periodically and forwards the retrieved JSON payload to a timeseries DB for visualisation and alerting.
After a period of ~36 hours I start to see a large increase in the calculated current_lag value per Topic Partition. I can find no natural explanation for this lag in terms of an increase of messages produced, or a slowdown in the Consumers within the Group. What I have found is:
- Restarting the poller makes the
current_lagreset to zero. - Starting a second poller alongside causes different
current_lagvalues to be reported from the first poller.
To be clear: I have no fancy logic in the pollers - they simply perform a HTTP GET on v3/kafka/$CLUSTER/consumer/$CONSUMER_GROUP/lag and report the current_lag per Partition.
Have I misconfigured something?
This is Burrow v1.8.0