numaflow icon indicating copy to clipboard operation
numaflow copied to clipboard

Resume pipeline from a particular source-state

Open vigith opened this issue 1 year ago • 1 comments

Problem

Today, we cannot start the numaflow pipeline at a particular source-state. We always start from a relative state (e.g., latest, oldest offset); we cannot say start at offset 12345. Nothing is stopping the user from giving an absolute offset in the config, but the problem is that when the pod restarts, it will start again from the absolute position.

Related issue #925

Benefit

Resuming a pipeline from an arbitrary state will help the reprocessing of the data much easier.

What needs to change

We do not differentiate between start v/s restart of pod. If we can differentiate between these, we can start at an absolute location, and after the restart, it can follow the relative position.

vigith avatar Sep 13 '23 21:09 vigith

Another potential benefit of going in that direction:

  • Users could decide to rewind (automatically?) to a previous offset for which they know all messages are processed and reached sinks on ISBSVC errors to ensure no data is missing, as opposed to rely on disk persistence and replication of inter step buffers. It requires the source to have the ability to replay previous offset, and the pipeline to be idempotent. If users can do that, they have a "failure safe" pipeline that is much faster. Downside is that if the pipeline is complex and long, the ISBSVC failure delay may be problematic. Good side is that for most use cases the low probability of an unsafe messaging failure and the small delay incurred might make it more profitable to go for the boost of unsafe messaging with automated offset rewind on failure.

A potential issue:

  • A user defined source could be unable to find past messages for some past offsets, depending on what it does.

QuentinFAIDIDE avatar Mar 10 '24 21:03 QuentinFAIDIDE