prometheus-kafka-consumer-group-exporter
prometheus-kafka-consumer-group-exporter copied to clipboard
Timeout must not be negative
Container fails sometimes with following error
Traceback (most recent call last):
File "/usr/local/bin/prometheus-kafka-consumer-group-exporter", line 11, in <module>
load_entry_point('prometheus-kafka-consumer-group-exporter', 'console_scripts', 'prometheus-kafka-consumer-group-exporter')()
File "/usr/src/app/prometheus_kafka_consumer_group_exporter/__init__.py", line 165, in main
for message in consumer:
File "/usr/local/lib/python3.8/site-packages/kafka/consumer/group.py", line 1181, in __next__
return self.next_v2()
File "/usr/local/lib/python3.8/site-packages/kafka/consumer/group.py", line 1189, in next_v2
return next(self._iterator)
File "/usr/local/lib/python3.8/site-packages/kafka/consumer/group.py", line 1106, in _message_generator_v2
record_map = self.poll(timeout_ms=timeout_ms, update_offsets=False)
File "/usr/local/lib/python3.8/site-packages/kafka/consumer/group.py", line 635, in poll
assert timeout_ms >= 0, 'Timeout must not be negative'
AssertionError: Timeout must not be negative
Any ideas?
Hi @gfelixc, that's not something I've seen, but it looks like it might be a bug in the kafka-python library.
In KafkaConsumer, next_v2() sets _consumer_timeout to some time in the future (based on consumer_timeout_ms), and then calls next() on _message_generator_v2() while _consumer_timeout hasn't been reached. _message_generator_v2() then subtracts the current time (time.time()) from _consumer_timeout to get the timeout_ms to pass to poll().
If too much time elapses between checking if _consumer_timeout has been reached and calculating the timeout_ms, it could end up being negative. It seems like _message_generator_v2() should check for this, and use 0 if it calculates a negativetimeout_ms.
Assuming my quick analysis is correct, would you be able to raise an issue (or PR) with kafka-python to get this fixed?
(Just to check - you haven't changed consumer_timeout_ms from the default 500ms, have you?)
Image has been deployed as is, no config changes. I'll raise to kafka-python as you suggest, and I'll let you know once fixed. Thanks a lot. Do you mind keep this ticket opened until kafka-python with fix would be updated?
No worries, I'll keep this open. If you could link to the kafka-python issue once created it'd be great.