rust-rdkafka icon indicating copy to clipboard operation
rust-rdkafka copied to clipboard

Event interface

Open vojtechkral opened this issue 4 years ago • 5 comments

Hi, we're using rust-rdkafka at @braiins . However, we found out that we probably wouldn't like to use the interface as is because of the requirement to spin up threads per stream and/or polling. We decided to fork it internally and switch to the event interface of librdkafka, which seems to be pretty efficient and goes quite well with the Rust's Futures model. At least a lot better compared to the callback interface, which is kind of hard to hook up with Futures/Streams without additional threads and/or long polling. I have written support for the evented interface in rust-rdkafka for our internal fork and after that we observed a multiple-times improvement in performance.

I noticed there is some work being done to get rid of the threads, however, I didn't study it in detail and if I'm looking right the evented interface is not being used...

I wanted to ask:

  • Is there a technical reason why the evented interface is not used in rust-rdkafka? Is there some kind of a downside I don't see?
  • Would you be interested in accepting code from us? I would probably need to do some cleanups, fill in documentation etc. for public usage, however, I don't want to spend time on that unless there is interest in it :-)

Thank you and thanks for making rust-rdkafka, Vojtech

vojtechkral avatar Jun 26 '20 12:06 vojtechkral

Hello!

I haven't worked on the library for a while (@benesch has been doing all the work for the one 1+ years!), but I remember that moving to the event-based librdkafka interface was one of the goals I had in mind and I was actively experimenting on ~2 years ago, so I'd definitely be interested in the change you're proposing. I can't think of any downside, but again, my mental model of the library is a bit stale.

I imagine that the rust-rdkafka public interface would remain the same?

I leave it to @benesch to comment whether he'd have time to review this change or not, since it's probably going to be a significant amount of work.

fede1024 avatar Jun 29 '20 11:06 fede1024

Apologies for the delayed response—things have been busy at work lately!

I can't promise that I'll have time to review any large changes in the short term, but hopefully in the medium-to-long term. I'm definitely interested in anything that can either simplify the implementation of rust-rdkafka or improve its performance.

That said, I'm not too familiar with librdkafka's event API. To make sure we're on the same page: you're talking about rd_kafka_conf_set_background_event_cb and friends, right? I'm not totally sure how that would fit in to the existing StreamConsumer and FutureProducer—both of those have fairly efficient implementations now, and the StreamConsumer doesn't even need a separate thread. But I can definitely believe that there are simplifications and performance improvements to be had.

So, would definitely be interested to hear more about your approach! I think no matter what we'll want to retain the polling in the BaseConsumer and BaseProducer, but there's no reason that the StreamConsumer and FutureProducer couldn't get a more efficient event-based implementation.

benesch avatar Jul 08 '20 02:07 benesch

Apologies for the delayed response—things have been busy at work lately!

Completely understandable, I know the feeling :)

That said, I'm not too familiar with librdkafka's event API.

Well librdkafka's documentation is less clear/elloquent about the event interface, it took me some headscratching & hokuspokus to grok it...

To make sure we're on the same page: you're talking about rd_kafka_conf_set_background_event_cb and friends, right?

I'm using the rd_kafka_queue_cb_event_enable() function to register a small function as an event callback - the only thing it does is that it calls a Waker that wakes a stream waiting on the consumer/producer.

I've modified the StreamConsumer to do this. I haven't modified any producer and instead created an AsyncProducer type which does the analogous thing. Both StreamConsumer and AsyncProducer have a function called event_stream() which returns an EventStream type which implements the stream, kind of like MessageStream that you already have, except it yields events and doesn't do any polling. The event types for consumer and producer differ somewhat (consumer has fetch events, producer DR events).

Another thing about this approach is that basically I don't need the Context types, my StreamConsumer type has no generic argument. The data is instead contained in the events. Although honestly I've so far only written support for the event types that I acutally use (we don't use events like rebalance or stats so far).

I'll try to share the code somewhere for peeking. Perhaps I could somehow divide the changes into smaller chunks so that they could be easier to review...

vojtechkral avatar Jul 08 '20 11:07 vojtechkral

Well librdkafka's documentation is less clear/elloquent about the event interface, it took me some headscratching & hokuspokus to grok it...

No kidding. It's hardly mentioned in issue reports or the wiki either.

I think the best path forward here would be to introduce your consumer as a new consumer type alongside your new producer. (AsyncProducer isn't quite different enough from FutureProducer IMO, but we can bikeshed the names later!) There's enough code written around the current model, plus it's the model that people familiar with librdkafka (or its bindings in other languages) will be used to, that I think we'll want to preserve it as an option.

benesch avatar Jul 09 '20 02:07 benesch

@vojtechkral I'd be very interested to see what you have. Can you make it available somewhere?

davidblewett avatar Jan 04 '22 22:01 davidblewett