Trimming journal
The idea is to let users trim journal given a command, and remove all the commands that happened afterwards + their completions.
Mark & copy up (excluded) the trim point.
The challenge of this feature is the restart mechanism. This is roughly how I'm gonna implement it:
- Add a new field to invocation status
restarts. We start writing this in 1.3, and when not present, its value default to 0 (this makes easy back/front compat) - When sending
Invokecommand to invoker, we send thisrestartsfield. SendingInvokecommands to invoker is a transient message, not written to storage, so all good for versioning - When the invoker reads the journal, it reads the invocation status, and when doing so it also reads this
restartsfield. If therestartscount doesn't match the one of the invoke command, boom this command is invalid and discarded. - When the PP sends an
Abortcommand, it means the state machine either transitioned the invocation toEndor incremented therestartscount, thus making sure that the invoker will fence off old streams. Also internally in the invoker <-> invocation task communication this field is used too to fence off old messages - When the invoker sends
InvokerEffectto PP it attaches therestartscount. Therestartscount is not written when 0, thus making sure back-compat is easy. - If the invoker gets an
Invokewith a higher restarts count from the state machine, it aborts the previous one. Essentially eitherAbortwithrestartsorInvokewithrestarts + 1wins.
I'm also gonna proceed to remove the Killed state, as it's not needed anymore.
When the PP sends InvokerEffect it attaches the restarts count. The restarts count is not written when 0, thus making sure back-compat is easy.
Did you mean the invoker instead of the PP?
Introducing something like an invocation_epoch sounds like a good idea to me. From the top of my head, it should solve the problem that the Killed status tried to solve before in a nicer way.
fyi @AhmedSoliman
Updating this with new findings. Fencing off invoker effects is not enough, we also need to fence off completions coming from other PPs belonging to old invocation epochs. This is how i plan to do that:
ServiceInvocationResponseSink and friends need to carry around the invocation epoch of the caller invocation.
Then we need to store the following data structure in the caller invocation status:
max_epoch_per_comp_range: map<numeric range of completion_id, maximum inclusive epoch allowed>
This data structure is updated every time we trim accordingly. The invariant of this data structure is that ranges MUST be NON overlapping. This data structure seems to fit https://docs.rs/rangemap/latest/rangemap/inclusive_map/struct.RangeInclusiveMap.html
And then the algorithm when I get a journal entry (which can be either command, completion or signal) is as follows:
on entry:
if no entry.epoch or entry is signal: accept
if entry.epoch equal: accept
if entry.epoch different:
if entry is command: discard // This is the case of invoker sending commands for old epochs
if entry is completion:
if max_epoch_per_comp_range[completion.id] <= entry.epoch: accept
all the other cases: discard
For now we won't make awakeable epoch aware, this adds a reasonable amount of complexity and is not even necessarily what the user wants. Plus the trim and restart should be a break the glass operation, alike kill, so it is expected that some inconsistencies might arise. When we get to expose signals, this problem will also go away, as users will manually input the correlation id to complete.
A note about naming and relationship with kill:
- The name of the operation will be
reset:- From beginning (this will be present in the UI but not in the Admin API itself)
- From given entry index
- By default, it will kill child invocations that have been trimmed
- By default, it will revert the state
Both kill and reset rest endpoints should start exposing the following knobs:
- Kill child invocations
- Revert state
We keep the defaults we have for kill (changing the default would require https://github.com/restatedev/restate/issues/2765), in any case we're gonna play with these defaults in the UI.