quickwit icon indicating copy to clipboard operation
quickwit copied to clipboard

Refactor handling of pre and post rebalance events

Open guilload opened this issue 1 year ago • 5 comments

Description

I moved the event handling logic into the main actor using sync channels in the consumer context. This simplifies the code, removes the need for messing with the tokio runtime, and makes testing easier.

How was this PR tested?

Pending

guilload avatar Aug 02 '22 18:08 guilload

This isn't working for me. I see the pre_rebalance event but never see the post_rebalance event, it looks like it's blocking waiting for ack_rx in pre_rebalance.

kstaken avatar Aug 02 '22 19:08 kstaken

It looks like RebalanceEvent::Starting never makes it to emit_batches so the whole process blocks on the first rebalance.

kstaken avatar Aug 02 '22 20:08 kstaken

Yes, there's a deadlock t_t

guilload avatar Aug 02 '22 22:08 guilload

I fixed the deadlock. Since we're executing sync code in the rebalance hooks, calling consumer.recv() is actually sync and blocking. Running the consumer poll loop in a blocking tokio task solves the issue.

guilload avatar Aug 04 '22 17:08 guilload

Tests are not fully passing because we ignore the passed checkpoint. Therefore, they need to be rewritten.

guilload avatar Aug 04 '22 17:08 guilload

@guilload Approved. Great work there!

It's a bummer librdkafka only offers a blocking API. Hopefully things get better in the future. You did a great job at wrapping that in spawn_blocking task in a readable way.

A little bit of text explaining that contraption and links to Kafka docs / blog post explaining the Revoke -> Barrier -> Assign dance would be awesome.

fulmicoton avatar Aug 24 '22 19:08 fulmicoton