confluent-kafka-python icon indicating copy to clipboard operation
confluent-kafka-python copied to clipboard

`Consumer.poll` goes into infinite loop

Open r-priyam opened this issue 1 year ago • 5 comments

Description

Consumer.poll client is going into infinite loop when the Kafka is running on Kubernetes. When the cluster is scheduled on the new node is just stuck on the below error:

Screenshot 2024-06-20 at 4 02 28 PM

As you can see in the screenshot the service was just stuck and then it recovered after like 12 hours.

How to reproduce

import socket
from confluent_kafka import Consumer, Message

consumer = Consumer(
	{
            "bootstrap.servers": "",
            "client.id": socket.gethostname(),
            "auto.offset.reset": "earliest",
            "enable.auto.commit": False,
            "group.id": "",
        }
)
consumer.subscribe(topics=[...])
msg: Message = self._consumer.poll(timeout="10') # Goes into infinite loop here and gets stuck

Run the above code and then restart the kafka cluster or change the node in k8s

Checklist

Please provide the following information:

  • [ ] confluent-kafka-python and librdkafka version (confluent_kafka.version() and confluent_kafka.libversion()): 2.1.1
  • [ ] Apache Kafka broker version:
  • [x] Client configuration: { "bootstrap.servers": "", "client.id": socket.gethostname(), "auto.offset.reset": "earliest", "enable.auto.commit": False, "group.id": "", }
  • [x] Operating system: Linux
  • [ ] Provide client logs (with 'debug': '..' as necessary)
  • [ ] Provide broker log excerpts
  • [x] Critical issue

r-priyam avatar Jun 20 '24 10:06 r-priyam

It feels that the host is not reachable or up.

pranavrth avatar Jun 20 '24 10:06 pranavrth

@pranavrth the host gets up back in like max 2 mins, it's just k8s scheduling it on the another node which changes the IP, the service container works fine when it's restarted.

r-priyam avatar Jun 20 '24 12:06 r-priyam

@pranavrth, any luck, please? Appears that the package is handling this issue at the very low level and not raising an exception?

r-priyam avatar Jun 25 '24 19:06 r-priyam

@pranavrth, upon further digging in the source code, we found that we can use "error_cb" in the config to have the callback whenever the error is raised. It would be good if the client has raised this directly instead of having a callback.

r-priyam avatar Jun 26 '24 08:06 r-priyam

Can you enable debug logging by using 'debug': 'all' in the config and send us the logs to investigate?

pranavrth avatar Jun 28 '24 10:06 pranavrth

Closing as further info was not provided

MSeal avatar Jul 23 '25 22:07 MSeal