Frequent snapshots can accumulate to large sizes
Since every snapshot is a full snapshot, frequent snapshots can quickly accumulate to noticeable sizes. A few of our users mentioned this. We need to improve the situation for them by either cleaning up older snapshots which are no longer needed or by using incremental snapshots which build upon previous snapshots.
The former might be easier to achieve by having a snapshot manager which deletes older snapshots beyond the number of snapshots to retain.
@pcholakov you probably talked to Ahmed about this as well, right? What's your intuition for a possible solution to the problem?
@tillrohrmann indeed we did!
A common cause of large snapshot repositories is that we show way too low value in terms of record interval in docs & examples - let's increase that drastically, something on the order of 1-10M might be a much more appropriate starting point.
We can also make it possible to express the snapshotting interval either in terms of accumulated log bytes (= making it a bit easier to reason about catch-up time for replays), or time (obviating the need for external cronjobs). Tracking byte offsets requires some Bifrost work however.
If Restate can track the contents of multiple snapshots in the repository, it can take over pruning too; this is safer than relying on the object store's native pruning mechanism has a risk that it doesn't pause if snapshots are not produced - we know exactly when older snapshots are no longer needed.
If we are tracking the objects stored in the repository anyway, we can also move towards incremental successive snapshots when the same node is producing them. I saw good results with this approach when I tested a prototype a couple of months back.
Finally, if we lean more on peer-to-peer snapshot transfers, this might relegate S3/object-store-based snapshots to more of an backup mechanism. This might fit in better with the upcoming multi-DB feature - I haven't thought through the implications of that completely but that could archive entire groups of partitions.