confluent-kafka-python
confluent-kafka-python copied to clipboard
Producer gets stuck in init_producer_id loop
Description
Observed a pathological behavior which is missing in other clients (java's kafka-clients & franz-go).
With disabled transactions when any client (including confluent_kafka) is about to send init_producer_id it doesn't use txn coordinator, picks a broker randomly and sends a request there. If the request fails resulting in broker returning a retry-able error kafka-clients & franz-go restart the procedure from scratch, pick a random broker again and use it to retry the request while confluent_kafka is reusing the original broker. In case the problem with the broker is permanent kafka-clients & franz-go has a chance to choose a healthy node while confluent_kafka gets stuck.
How to reproduce
I've used Redpanda but based on the source code the same problem affects Kafka too (not sure if Kafka returns retry-able error for init_producer_id but it returns unknown_server_error which confluent_kafka treats the same way and retries). The repro step:
- have a 3 node cluster
- isolate a node
- start producing record
- if the producer sends
init_producer_idto the isolated node it gets stuck
Client logs (docker-rp-4 was isolated):
%4|1651850857.547|GETPID|rdkafka#producer-7| [thrd:main]: Failed to acquire idempotence PID from broker docker-rp-4:9092/1: Broker: Not coordinator: retrying
%4|1651850860.049|GETPID|rdkafka#producer-7| [thrd:main]: Failed to acquire idempotence PID from broker docker-rp-4:9092/1: Broker: Not coordinator: retrying
%4|1651850862.550|GETPID|rdkafka#producer-7| [thrd:main]: Failed to acquire idempotence PID from broker docker-rp-4:9092/1: Broker: Not coordinator: retrying
%4|1651850863.050|GETPID|rdkafka#producer-7| [thrd:main]: Failed to acquire idempotence PID from broker docker-rp-4:9092/1: Broker: Not coordinator: retrying
%4|1651850873.155|GETPID|rdkafka#producer-7| [thrd:main]: Failed to acquire idempotence PID from broker docker-rp-4:9092/1: Broker: Not coordinator: retrying
%4|1651850883.259|GETPID|rdkafka#producer-7| [thrd:main]: Failed to acquire idempotence PID from broker docker-rp-4:9092/1: Broker: Not coordinator: retrying
%4|1651850886.100|GETPID|rdkafka#producer-7| [thrd:main]: Failed to acquire idempotence PID from broker docker-rp-4:9092/1: Broker: Not coordinator: retrying
Checklist
Please provide the following information:
- [ ] confluent-kafka-python version: 1.8.2
- [ ] Apache Kafka broker version: Redpanda v22.1.2
- [ ] Client configuration:
{"bootstrap.servers": ..., "request.required.acks": -1, "retries": 5, "enable.idempotence": True } - [ ] Operating system: ubuntu 20.04
- [ ] Provide client logs
- [ ] Provide broker log excerpts
- [ ] Critical issue
@edenhill - seems like something we could improve
Good find! Created upstream librdkafka issue: https://github.com/edenhill/librdkafka/issues/3848
tracking this in the above mentioned librdkafka issue.