aquadoggo icon indicating copy to clipboard operation
aquadoggo copied to clipboard

Dispatch `reduce` tasks for all unmaterialized entries during start up

Open adzialocha opened this issue 1 year ago • 0 comments

Our materializer has two sorts of "events" which are important to re-attempt when a node quit prematurely to assure we're not losing data:

  1. Re-attempt tasks
  2. Re-attempt unmaterialized operations

They seem related but actually are independent from each other: Tasks do not necessarily represent arriving operations. Let's say an operation arrives for the first time, kicks in a reduce task, followed by a dependency task. Now the node got shut off before that dependency task finished. We're sending that operation again on restart to re-attempt that flow, the reduce task will quit early, saying it already has done its work last time. No dependency task will be dispatched, we're having a problem and lost data.

This is also true vice-versa: Tasks are handled too late in some race conditions where operations got successfully stored, but the node quit before the reduce task got created. We've lost data again.

The first point (Tasks) we already solved, but we need to also account for unmaterialized operations. This was not possible until now, since it wasn't easy to distinct in our database if an operation has been materialized or not. Now we have a sorted_index which represents that state, see: https://github.com/p2panda/aquadoggo/pull/438

On node startup we should check which operations have sorted_index = None and then issue reduce tasks for them.

adzialocha avatar Jul 07 '23 11:07 adzialocha