ouisync
ouisync copied to clipboard
Issues with move/rename
There are multiple issues with moving files/directories:
Cases
1. Loss of atomicity after partial sync
Move is an atomic operation on the replica performing the move - that is, the entry is either both removed from the source location and added to the destination location and none of those happen. However, when the repository is synced to a remote replica, this atomicity is lost. This is because the remote replica might not immediately receive all the blocks that were affected by the move. There are four possible outcomes:
- Both the source and destination directory blocks are received
- None of them are received
- The source is received but the destination is not
- The destination is received but the source is not
The first two cases are not problematic - the remote replica will either see the situation as it were before the move or after the move is fully completed. The 3 and 4 however are: In the 3rd case, the user sees the entry being removed from the source but not yet added to the destination, thus a temporary (but possibly permanent) data loss occurs. In the 4th case the user sees the entry both in the source and in the destination, thus the entry ends up having two (or more) parents (this will be described later).
Note that this situation eventually resolves itself once all the directory blocks are received. However, the user might still observe (or even interact with) the repository before that happens. In the worst case the connection between the replicas is lost and potentially never recovered and the situation becomes permanent.
2. Concurrent move of the same entry
Two replicas might concurrently move the same entry each to a different location. After they sync, the entry ends up having two (or more) parents.
3. Concurrent cyclic move
Say there are two directories, a and b. One replica moves a to b while other replica concurrently moves b to a. After they sync, a cycle is created where a is both parent and child of b and at the same time both become detached from the directory tree and thus unreachable. Because of that they are eventually garbage collected resulting in permanent data loss.
Consequences
Multiple parents
- Potential confusion because it looks like the entry exists in multiple copies while in reality it's the same entry (same blob), just pointed to from multiple directories
- Removing the entry from one of its location leaves it in the others. Only when the entry is removed from all its locations the underlying blob is removed as well. Thus the entry behaves like a hardlink even though there is no obvious way to tell its a hardlink (if the repo is mounted, then
lsstill lists the entry as having only one link). However, apart from the confusion, at least no data loss / data corruption happens - When the file is modified from one location, only the version vector pertaining to that location is updated. This can cause further issues when the file is modified concurrently where each replica modifies it from a different location (TODO: what exactly happens in this case?)
Data loss
In the first case, the data loss is only apparent because the blob is not actually deleted (this is because garbage collector is suppressed when there are directories with missing blocks). For the user the entry is not accessible however and so from their point of view it's lost.
In the third case the data loss is permanent however. This is probably the worst case scenario that can currently happen due to move (that we know of).
Ideal outcome
-
In case 1, ideally we want the move to appear atomic also to the remote replicas. That is, we want them to either see the entry still in the source location only or in the destination location only but never in both or neither.
-
In case 2 and 3, one possible way to handle this would be to somehow choose one of the operation as "winning" and discard the others. The decision which operation to chose can be arbitrary but it must be such that every replica eventually sees the same outcome. Discarding the "losing" operations is OK because no data loss occurs. The entry is still there but potentially in a different location that the user wanted. This is no different from the user first moving the entry, then syncing with a remote replica and then that remote replica moving the entry to yet another location and finally syncing back to the original user. Thus this should not cause any confusion assuming the user understands that ouisync is a distributed/collaborative tool.
Solution
The paper A highly-available move operation for replicated trees describes an algorithm that seems to solve this problem. Further research is needed to conclude whether it's applicable to ouisync.
Another potentially relevant paper: https://arxiv.org/pdf/2103.04828.pdf
https://redmine.equalit.ie/issues/31636