Understanding state handling upon message redelivery
Hello! I am looking into using Goka for my stateful Kafka processing needs. I want some clarification on what I should expect from Goka's at-least once guarantees and how it handles state when messages are redelivered.
Consider the scenario where a processor processes a message with offset N (from an input stream) and updates the state (via ctx.SetValue), but then the processor crashes after writing the new state (fully or partially) but before committing offset N back to Kafka. On restart (or failover), Kafka will re-deliver the message at offset N (since the commit didn’t happen).
My understanding after having read the code is that on startup/rebalance, Goka replays the compacted changelog into local storage up to the last committed offset. Input consumption only resumes once this replay is complete. What is not clear to me is exactly what state will be reflected to the callback upon redelivery. When message N is redelivered, will ctx.Value() reflect the state before N was originally processed or will it be after? In other words, will ctx.Value() return the same thing every time message N is redelivered? Moreover, does this answer vary across the following scenarios:
- State is updated in local storage but not persisted to the changelog topic
- State is updated in both local storage and persisted changelog topic
- State is persisted in changelog topic but not local storage (My understanding is this is not possible due to the ordering of state updates, but posing it to confirm).
Thanks!
These are great questions. I think @frairon can help you here.