Potential consistency issue with GC scheme & synchronization

Open pfernie opened this issue 4 years ago • 1 comments

In considering alternatives for #190, the following scenario occurred to me which seems like the current GC scheme may also be vulnerable to, so I wanted to open as a separate Issue to improve its visibility.

The problem arises if we cannot assume that when synchronizing between hosts, we cannot assume any particular ordering of changes (which I think is generally true).

Assume hosts A and B with synchronized stores.
GC is initiated on A, but interrupted.
Both hosts are able to fully synchronize, so both hosts now reflect (the same) in-progress GC.
GC is resumed on A, and completes so A now reflects only a single generation (B remains in the state from the prior step). Let us assume in this particular case there was nothing to GC, so all chunks ended up migrated to the new generation on A.
Synchronization between A and B occurs, but does not fully complete. Specifically, let us assume all the name data has synchronized (moved to the newest generation on both A and B), but not all chunk data has synchronized (some chunks still live under older generations on B).

At this point, if synchronization is not completed on B, but a gc is issued on B (assume this occurs after the GC grace time period):

B will see two generations locally:
- [0]: The original generation A and B knew initially. This generation has chunks, but no names.
- [1]: The new generation from the GC initiated and completed on A, which is only partially synced to B. In our assumed case, it contains all the names, but only some of the chunks.
B will see this as a GC in-progress, examine its oldest local generation ([0]), and see that it contains no names, and wipe it (again, assume we are outside the GC grace time).

This leaves its local store with names that have missing chunks. What the behavior would be on next synchronization I guess would depend on the synchronization mechanism, but even in the happy case that the missing B data were restored from A, it would be the case that for some time B would have a damaged store.

This is a bit contrived and involves a specific sequence of interrupted actions and invoking the gc on multiple hosts at specific times. And perhaps the GC grace time is considered sufficient mitigation ("we will surely fully sync within this window"); but I did want to raise this case as possible, at least under my understanding.

If this scenario is plausible, I believe just prior to wiping a generation (while locked), you would need to double-check and visit all names in younger generations to promote any chunks needed.

Jan 18 '21 16:01 pfernie

I think you're analysis of this scenario is correct.

The assumption is that the windows before anything is being deleted is large enough to ensure full sync. When this assumption is broken the data might be lost in many scenarios.

E.g.

A & B start synced
B deletes some names, does a GC
time passes, no sync was done on time
B deletes old chunks
A adds new names to the old generation - the only one it knows of, assumes that the existing chunks are there, writes only new ones
sync happens; deletes from B propagate to A; the last name written on A only has the newly written chunks, the one that used to exist, are no more

Jan 20 '21 06:01 dpc