Potential consistency issue with GC scheme & synchronization
In considering alternatives for #190, the following scenario occurred to me which seems like the current GC scheme may also be vulnerable to, so I wanted to open as a separate Issue to improve its visibility.
The problem arises if we cannot assume that when synchronizing between hosts, we cannot assume any particular ordering of changes (which I think is generally true).
- Assume hosts
AandBwith synchronized stores. - GC is initiated on
A, but interrupted. - Both hosts are able to fully synchronize, so both hosts now reflect (the same) in-progress GC.
- GC is resumed on
A, and completes soAnow reflects only a single generation (Bremains in the state from the prior step). Let us assume in this particular case there was nothing to GC, so all chunks ended up migrated to the new generation onA. - Synchronization between
AandBoccurs, but does not fully complete. Specifically, let us assume all thenamedata has synchronized (moved to the newest generation on bothAandB), but not allchunkdata has synchronized (some chunks still live under older generations onB).
At this point, if synchronization is not completed on B, but a gc is issued on B (assume this occurs after the GC grace time period):
Bwill see two generations locally:[0]: The original generationAandBknew initially. This generation has chunks, but no names.[1]: The new generation from the GC initiated and completed onA, which is only partially synced toB. In our assumed case, it contains all the names, but only some of the chunks.
Bwill see this as a GC in-progress, examine its oldest local generation ([0]), and see that it contains no names, and wipe it (again, assume we are outside the GC grace time).
This leaves its local store with names that have missing chunks. What the behavior would be on next synchronization I guess would depend on the synchronization mechanism, but even in the happy case that the missing B data were restored from A, it would be the case that for some time B would have a damaged store.
This is a bit contrived and involves a specific sequence of interrupted actions and invoking the gc on multiple hosts at specific times. And perhaps the GC grace time is considered sufficient mitigation ("we will surely fully sync within this window"); but I did want to raise this case as possible, at least under my understanding.
If this scenario is plausible, I believe just prior to wiping a generation (while locked), you would need to double-check and visit all names in younger generations to promote any chunks needed.
I think you're analysis of this scenario is correct.
The assumption is that the windows before anything is being deleted is large enough to ensure full sync. When this assumption is broken the data might be lost in many scenarios.
E.g.
- A & B start synced
- B deletes some names, does a GC
- time passes, no sync was done on time
- B deletes old chunks
- A adds new names to the old generation - the only one it knows of, assumes that the existing chunks are there, writes only new ones
- sync happens; deletes from B propagate to A; the last name written on A only has the newly written chunks, the one that used to exist, are no more