quickwit icon indicating copy to clipboard operation
quickwit copied to clipboard

WIP: Distributed Kafka consumer based on group subscription

Open kstaken opened this issue 1 year ago • 0 comments

Description

This converts kafka_source to be based on Kafka consumer group membership but with offset storage still maintained by Quickwit. Once completed this will maintain exactly-once processing semantics on postgres backed metastores while allowing multiple nodes to participate in data ingest.

Pending

  • [ ] Invalidate in flight splits during rebalance on the local node
  • [ ] Guard against stale split publication from remote nodes
  • [ ] Optimize rebalancing so split invalidation only occurs if partition assignments change
  • [ ] Ensure no orphaned splits are left in Staged status by the exactly once guards. This currently happens with the checkpoint delta guard.
  • [ ] Merge planner fails when there are parallel indexers #1795

This replaces PR #1798 to incorporate collaboration with @guilload.

How was this PR tested?

Kafka integration tests currently pass. Some basic manual testing with multiple Quickwit instances to confirm rebalances and offset restoration appear correct. More advanced testing to verify results is pending completion of exactly once processing behavior.

fixes #1794

kstaken avatar Aug 01 '22 20:08 kstaken