leofs
leofs copied to clipboard
[Discussion] Eventual Consistency may make user confused in error case
With N=3, W=2
, user would receive error message in two situations
- No Write accepted, the new data will never be in place
- 1 Write accepted, write quorum is not reached, but the new data will be in place.
Some users may understand this is what eventual consistency would give, some may be confused.
I want to discuss what can we do in this case.
One possible solution would returning the error with more detail like the number of write accepted. As for write after error, depending on the use case, when 1 accepted, a client may have a choice not to retry a write by relying on eventual consistency (but in almost cases, the cluster should operate with W=1) and when 0 accepted, a client must retry a write. As for read after error, without any conflict, when 1 accepted, a client will get what the previous write put and when 0 accepted OTOH a client may get what the prev write put or may not get because there are two kind of no write accepted errors
- no writes reached to any storage nodes due to temporal network partition
- no writes respond to the coordinator but actually the write reached to at least one of storage nodes so in this case, a client get will be succeeded once the network trouble is gone.
That said, returning also number of write accepted may give a little bit info to clients and make it a little bit clear what happened but IMHO still make them confused enough.
So going a step further, we can also return the info whether the coordinator actually send the write request to any storage or not (this can happen when leo_redundant_manager has already detected all nodes are unavailable) in addition to number of write accepted and this can guarantee write never succeeded on every node. it might help in some cases.