Burrow
Burrow copied to clipboard
Consumer status/lag showing NOTFOUND
Using the api /v3/kafka/CLUSTER/consumer
I am able to get a list of consumers, however when I try to pass any of those into either /v3/kafka/CLUSTER/consumer/ENTITY_FROM_CALL/status
or /v3/kafka/CLUSTER/consumer/ENTITY_FROM_CALL/lag
it returns cluster or consumer group not found. I am able to get a list of topics as well and then see the offset of that particular topic.
I am running on kafka 0.10.0.0, here is the config I am using.
[general]
pidfile="burrow.pid"
stdout-logfile="burrow.out"
[logging]
filename="logs/burrow.log"
level="info"
maxsize=100
maxbackups=30
maxage=10
use-localtime=false
use-compression=true
[zookeeper]
servers=[ "dv-kafka-01:2181", "dv-kafka-02:2181", "dv-kafka-03:2181" ]
timeout=6
root-path="/burrow_v1"
[client-profile.dv_kafka]
client-id="burrow-lagchecker"
kafka-version="0.10.0.0"
[cluster.dv_kafka]
class-name="kafka"
servers=[ "dv-kafka-01:9092", "dv-kafka-02:9092", "dv-kafka-03:9092", "dv-kafka-04:9092", "dv-kafka-05:9092", "dv-kafka-06:9092", "dv-kafka-07:9092", "dv-kafka-08:9092", "dv-kafka-09:9092", "dv-kafka-10:9092" ]
client-profile="dv_kafka"
topic-refresh=120
offset-refresh=30
[consumer.dv_kafka]
class-name="kafka"
cluster="dv_kafka"
servers=[ "dv-kafka-01:9092", "dv-kafka-02:9092", "dv-kafka-03:9092", "dv-kafka-04:9092", "dv-kafka-05:9092", "dv-kafka-06:9092", "dv-kafka-07:9092", "dv-kafka-08:9092", "dv-kafka-09:9092", "dv-kafka-10:9092" ]
client-profile="dv_kafka"
group-blacklist="^(console-consumer-|python-kafka-consumer-|quick-).*$"
group-whitelist=""
offset-topic="__consumer_offsets"
[httpserver.default]
address=":8081"
Having the exact same issue. Does anyone have input on this?
Usually when you see something like this (a group exists, and then returns 404 on fetching status) it's because all of the offsets for the group are older than the expire-group
time. Burrow does lazy expiration of this data only when the detail for the group is fetched (calling the consumer endpoint with either status
, lag
, or no qualifier, or by a notifier checking it).
It's hard to check for this within Burrow (as there's no way to fetch the data that doesn't cause it to be expired), but you can turn on debug logging (either by starting it with debug logging or by changing the log level using the API endpoint). I suggest restarting Burrow with debug logging enabled via config. This will log detail about every single offset that is read. You can then go through the logs and look for the offsets for your group and see what their timestamps are. You'll also see a "purge expired consumer" message when the expiration is triggered.
@Odinodin Someone else I work with found the issue that burrow was still reading all of the offsets from __consumer_offsets and that it had not finished when I was trying to query the lag/status endpoints for a consumers. That seems to explain why listing the consumers works but it did not know metadata about those consumers since it was still working through that topic gathering everything.
Hi I have the same issue here, so I'm trying to monitor Kafka consumers on our staging machine, below is the config file:
[general] pidfile="./burrow.pid" stdout-logfile="burrow.out"
[logging] filename="log/burrow.log" level="debug" maxsize=100 maxbackups=30 maxage=10 use-localtime=false use-compression=true
[zookeeper] servers=[ "example.com:2181" ] timeout=6
[client-profile.stage] kafka-version="0.10.2"
[client-profile.production] kafka-version="0.10.2"
[cluster.stage] class-name="kafka" servers=[ "example.com:9092" ] client-profile="stage" topic-refresh=120 offset-refresh=30 zookeeper-offset=true
[consumer.stage] class-name="kafka" cluster="stage" servers=[ "example.com:9092" ] client-profile="stage" offsets-topic="__consumer_offsets" #group-blacklist="^(console-consumer-|python-kafka-consumer-|quick-).$" group-blacklist="." group-whitelist=""
[httpserver.default] address=":18002"
I can also see all the consumers but can't get the lag on the consumer, below is a snippet of the log:
{"level":"debug","ts":1524773696.0555763,"msg":"dropped","type":"module","coordinator":"storage","class":"inmemory","name":"default","worker":7,"cluster":"stage","consumer":"ca_raw","topic":"ca_raw","partition":1,"topic_partition_count":0,"offset":24,"timestamp":1484519101275,"owner":"","request":"StorageSetConsumerOffset","reason":"old offset"}
also this is the result I got from hitting the /lag endpoint:
{
"error":
false,
"message":
"consumer status returned",
"request":
{
"host":
"MacBook-Pro.local",
"url":
"/v3/kafka/stage/consumer/ca_raw/lag"
},
"status":
{
"cluster":
"stage",
"complete":
1,
"group":
"ca_raw",
"maxlag":
null,
"partition_count":
0,
"partitions":
[],
"status":
"NOTFOUND",
"totallag":
0
}
}
Let me know if you need more information, thx in advance!
Same issue here. Burrow runs just fine for a couple of days and then this exact problem starts happening. Restarting solves the issue temporarily for us, but a fix would be great. Edit: now even restarting doesn't solve the issue anymore :/
I'm also seeing this issue for offsets that should be fairly recent
Just found the same Issue after burrow
restarted. If I log into kafka, The consumers are working as expected showing their lags been worked down. Burrow is unable to refresh and get group info
Any update or workaround for this issue? its been happening most of the me
We've faced the same issue. The Kafka consumers are active and working fine, but not listed in burrow consumers.