gosdk
gosdk copied to clipboard
Descriptive details on allocation repair
We have a consensus mechanism to accept some operation as okay
in client side(client needs to handle it).
The basic flow is, First we send operation request to blobbers and it will store it temporarily. If consensus is met then we send commit request to succeeding blobbers. This commit request will actually apply the changes requested in first step. Commit request will have writemarker as well.
To consider some operation as okay, we need to have successful data + 1
number of succeeded operations. So for example if we have an allocation with 10 data shards and 6 parity shards, then as long as any file operation in 11 shards succeeds then we consider it as successful operation. Now when this happens allocation is in need of repair because we want to maintain the parity shards so data is as much recoverable as possible.
Now consider a rename operation on file. Say all rename operation request in all 16 blobbers succeeded. Then we move to second step to commit. If 8 blobbers succeeded and 8 blobbers failed then the operation is considered as failed but for the file 8 have old name and 8 have new name. Now for a client this is like a file lost.
In the case above, we try to undo the change instantly to minimize failure affect allocation. The following are the undo for different operation:
- Upload --> Delete file in blobbers
- Update --> Blobber should also hold original content temporarily and undo can be done
- Rename --> Re-rename succeeding blobber's file to old name
- Copy --> Delete file
- Move --> Similar to rename. This operation is combination of (copy + delete). We should make this atomic.
- Delete --> There is no undo. Client cannot find this file anyway, but repair should delete it.
We would require following implementation to achieve proper repair functionality.
FileMetaRoot
It is the calculation of all the hashes of file meta data. FileMetaRoot unlike allocation root will be same across all blobbers. The following fields of file meta should be included:
- Path
- Type
- Size
- ActualFileSize
- ActualFileHash
- Child hash (hash of children's metadata hash)
FileMetaRoot should be calculated like allocationRoot.
Note: FileMetaRoot will be used to know if repair is required
Writemarker Modification
It is also important to know what operation was done so that FileMetaRoot
differed from other blobber's FileMetaRoot
.
We need to add operation field in Writemarker.
New writemarker would be like:
type WriteMarker struct {
AllocationRoot string
PreviousAllocationRoot string
AllocationID string
Size int64
BlobberID string
Timestamp common.Timestamp
ClientID string
Signature string
Path string
FileID int // Some file id that will be unique for its lifetime
Operation string
}
FileID seems to be a must. Otherwise we would not know which file path to operate for.
Having fileMetaRoot and changes in Writemarker, We can list writemarkers of each blobber and follow until we find same fileMetaRoot of all blobbers and we can start to repair from that point.
If one blobber has fileMetaRoots as h1, h2, h3, h4, h5. and other has h1, h2, h3 then with repair this other blobber should first attain h4 and then h5. However there will be some exceptions. For example, if rename is left to be done in one blobber but later on it is also deleted then we don't need to rename. We can directly delete that file. Again fileID is important for repair to be functional.
Note: Writemarker changes will make GDPR reporting accurate and effective
The idea of fileID seems at odds with a distributed system imo. A file could be uploaded, renamed, moved, updated etc. At what point is it no longer the same file?
For move or rename operations, it is just metadata change, so it has very little impact to temporarily accomodate more than one instance. In this case, knowing the file content hash can be useful, regardless of its current path, albeit an allocation could have multiple instances of an identical file.
But for file updates, the impact on Blobbers is potentially significant. A fairer way of doing this could be incremental versions of a file, the user pays for both versions until it is satisfied that it has sufficient consensus to delete the old version.
Also consider also the example of higher EC, like 10/30. There could be an instance where 11 data parts are successfully updated but 19 have not.
@sculptex Your point itself directs us towards having fileID.
Think about this, If there are two operations rename and update on a file then it will not be repairable because file is kind of lost in between two writemarkers. Also we cannot rely on content hash because as you said there can be identical content and we would not know which file to update.
FileID should be provided from client-side and it should be included in writemarker root hash calculation as well.
Sure, if fileID is a client provided value that could work, but think of sync scenario where same client uploading from multiple points, there could be clash. One option though could be that Id is random rather than sequential?
I agree that concurrent updates from different devices would be a concern.
Any worry about the size of the random FileID? If so, another option might be to assign a prefix to each device, so that there would be no clashes if files were being created from two different devices from the same client.
Even a random 32-bit fileID would give negligible chance of collision. (If already exists then another chosen).
The device prefix (well, a random start point for each device that would be incremented from) was my first thought, but then that fileID counter has to be stored somewhere.
Also consider simultaneous sync from 2 devices with same file path there could be two competing instances of the same file, each with a different fileID.
I think writemarker locking mechanism can help here to simply use incremental fileID starting from number 1. Also we are only concerned while uploading files and creating directory. For other file operations we use existing fileID.
@taustin So are we good to go?
So the idea is to use the writemarker locking mechanism to make sure that we don't have conflicts on the fileID numbering? If so, I'm good with that.
Use repair to add/replace blobber
duplicates two phase commit