rust-rdkafka
rust-rdkafka copied to clipboard
Event interface
Hi, we're using rust-rdkafka at @braiins . However, we found out that we probably wouldn't like to use the interface as is because of the requirement to spin up threads per stream and/or polling. We decided to fork it internally and switch to the event interface of librdkafka, which seems to be pretty efficient and goes quite well with the Rust's Futures model. At least a lot better compared to the callback interface, which is kind of hard to hook up with Futures/Streams without additional threads and/or long polling. I have written support for the evented interface in rust-rdkafka for our internal fork and after that we observed a multiple-times improvement in performance.
I noticed there is some work being done to get rid of the threads, however, I didn't study it in detail and if I'm looking right the evented interface is not being used...
I wanted to ask:
- Is there a technical reason why the evented interface is not used in rust-rdkafka? Is there some kind of a downside I don't see?
- Would you be interested in accepting code from us? I would probably need to do some cleanups, fill in documentation etc. for public usage, however, I don't want to spend time on that unless there is interest in it :-)
Thank you and thanks for making rust-rdkafka, Vojtech
Hello!
I haven't worked on the library for a while (@benesch has been doing all the work for the one 1+ years!), but I remember that moving to the event-based librdkafka interface was one of the goals I had in mind and I was actively experimenting on ~2 years ago, so I'd definitely be interested in the change you're proposing. I can't think of any downside, but again, my mental model of the library is a bit stale.
I imagine that the rust-rdkafka public interface would remain the same?
I leave it to @benesch to comment whether he'd have time to review this change or not, since it's probably going to be a significant amount of work.
Apologies for the delayed response—things have been busy at work lately!
I can't promise that I'll have time to review any large changes in the short term, but hopefully in the medium-to-long term. I'm definitely interested in anything that can either simplify the implementation of rust-rdkafka or improve its performance.
That said, I'm not too familiar with librdkafka's event API. To make sure we're on the same page: you're talking about rd_kafka_conf_set_background_event_cb
and friends, right? I'm not totally sure how that would fit in to the existing StreamConsumer
and FutureProducer
—both of those have fairly efficient implementations now, and the StreamConsumer
doesn't even need a separate thread. But I can definitely believe that there are simplifications and performance improvements to be had.
So, would definitely be interested to hear more about your approach! I think no matter what we'll want to retain the polling in the BaseConsumer
and BaseProducer
, but there's no reason that the StreamConsumer
and FutureProducer
couldn't get a more efficient event-based implementation.
Apologies for the delayed response—things have been busy at work lately!
Completely understandable, I know the feeling :)
That said, I'm not too familiar with librdkafka's event API.
Well librdkafka's documentation is less clear/elloquent about the event interface, it took me some headscratching & hokuspokus to grok it...
To make sure we're on the same page: you're talking about
rd_kafka_conf_set_background_event_cb
and friends, right?
I'm using the rd_kafka_queue_cb_event_enable()
function to register a small function as an event callback - the only thing it does is that it calls a Waker
that wakes a stream waiting on the consumer/producer.
I've modified the StreamConsumer
to do this. I haven't modified any producer and instead created an AsyncProducer
type which does the analogous thing. Both StreamConsumer
and AsyncProducer
have a function called event_stream()
which returns an EventStream
type which implements the stream, kind of like MessageStream
that you already have, except it yields events and doesn't do any polling. The event types for consumer and producer differ somewhat (consumer has fetch events, producer DR events).
Another thing about this approach is that basically I don't need the Context types, my StreamConsumer
type has no generic argument. The data is instead contained in the events. Although honestly I've so far only written support for the event types that I acutally use (we don't use events like rebalance or stats so far).
I'll try to share the code somewhere for peeking. Perhaps I could somehow divide the changes into smaller chunks so that they could be easier to review...
Well librdkafka's documentation is less clear/elloquent about the event interface, it took me some headscratching & hokuspokus to grok it...
No kidding. It's hardly mentioned in issue reports or the wiki either.
I think the best path forward here would be to introduce your consumer as a new consumer type alongside your new producer. (AsyncProducer
isn't quite different enough from FutureProducer
IMO, but we can bikeshed the names later!) There's enough code written around the current model, plus it's the model that people familiar with librdkafka (or its bindings in other languages) will be used to, that I think we'll want to preserve it as an option.
@vojtechkral I'd be very interested to see what you have. Can you make it available somewhere?