Subscribe to topic partition list
Currently when you want to subscribe to a topic - you are forced to subscribe to "default" offset and partitions, as in Consumer#subscribe you only can specify topic names. The underlying rd_kafka_subscribe accepts TopicPartitionList that allows choosing partitions/offsets if needed. I'd like to have a way to either subscribe to a TPL directly or have a way to pass more data to Consumer#subscribe, so that things like "explicitly consume from particular offset" could be easily done.
WDYT? What API would you like to have in the library for this?
To not create a separate subscribe_list method, I'd propose something like this:
def subscribe(*topics_or_tpl)
if topics_or_tpl.length == 1 && topics_or_tpl.first.is_a?(TopicPartitionList)
tpl = topics_or_tpl.first.to_native_tpl
else
# construct tpl the old way
end
# ...
end
@mensfeld WDYT?
Subscribe to topic set using balanced consumer groups.
You can subscribe to the topic and use the rebalance_cb to set expected offsets.
I know that librdkafka does not have pluggable assignment strategies (ref: https://github.com/edenhill/librdkafka/issues/2284) and this could potentially partially mitigate this.
I'm hesitant to recommend a separate method for TPL-based assignments like subscribe_via_tpl but not sure.
Aside from that: I would opt to make it fail-safe. That is, making sure that only valid TPLs can be used and if not, adding some sort of notifications around that. Doing it that way requires getting the topics metadata from the cluster in a non-cached way preferably.
Overall I do see it as an advanced use-case that this library should support. Is it a high-priority one? Def. not to me.
But this is what rdkafka actually allows you to set in it's example. The main usecase for me is to set offset to the beginning of the partition explicitly upon subscribe. I don't necessarily see it as being advanced :)
The main usecase for me is to set offset to the beginning of the partition explicitly upon subscribe
I was referring to building a custom assignment flow per partition to load-balance processes subscriptions.
The usecase of setting it explicitly upon the first usage differently per topic def. should be supported.
It is still not the top priority for me but if we all agree on the API (cc @thijsc ) I don't see any reason not to work on it together :)
I'm totally fine writing it myself, just want to agree on API beforehand.
From usability standpoint def subscribe(*topics_or_tpl) where there may be multiple topics or single tpl is the best one. It's backwards compatible and still has good API when using TPL
I agree about usability, I don't like the complexity of handling two sets/types of incoming arguments in the same method.
@fxposter you want to revisit this with me? :) happy to make it move forward