confluent-kafka-python
confluent-kafka-python copied to clipboard
Considerations when directly connecting to partitions instead of using rebalance?
This is a question, not an issue.
We've been using the normal rebalancing strategy, whereby n consumer threads are launched and allocated to partitions by kafka. However, this has occasionally resulted in issues, where a rouge thread accesses a topic with the same consumer group, causing a rebalance. It is not a regular occurrence but it has happened enough that I would like to explore different alternatives.
I noticed that I can directly connect to a partition via:
def on_assign(consumer, partitions):
print('on_assign')
nonlocal thread_number
partitions = get_partitions(thread_number)
consumer.assign(partitions)
Where get_partitions(thread_number) can query the RDBMS for a list of partitions/offsets this thread should request.
This seems to work. I opened multiple terminals and connected to the same partition, and it appears to behave as expected. The only thing I noticed is that when a second thread connects it will cause a "rebalance", which in this case simply means that on_assign is called again, but the correct partition and offset is still assigned since I manage the offsets in the RDBMS.
Is this a valid approach or is there any considerations that should be taken into account?
that's a reasonable approach if you have a static group (number of consumers doesn't change).
if you want a dynamic number of member, the group functionality gives you a lot you'd need to write yourself to do it properly - fencing, (sticky) partition assignment etc.