CCF icon indicating copy to clipboard operation
CCF copied to clipboard

Permanently delete items in the KV champ maps

Open eddyashton opened this issue 5 years ago • 2 comments

Currently we keep deleted items around forever with a tombstone entry. The value is minimal, but the entire key must be kept. We now have support for removing things from the underlying champ, so should use this. This will require some extra testing to make sure the semantics are correct and well understood.

There was an initial attempt in this PR, #1781. The comments there summarise the open questions.

eddyashton avatar Dec 08 '20 13:12 eddyashton

I've been thinking about this further, and convinced myself again that its actually safe to permanently delete things, as any other 'real' conflicts would be caught by other dependencies, or else there's no conflict. Also experimenting with diagrams to explain these tx conflicts.

Normal conflicts

So a normal conflict looks like this:

        1              2               3
A [ K=1 ]              |               |
B       | [ a=K? K=a+1 ]               |
C       |  [ a=K? K=a+1  X[ a=K? K=a+1 ]

Time goes left-to-right, versions are listed at the top. Transaction A writes a value to key K, and applies this at version 1. B and C try to do the same workload which is "increment K by 1". They both start after 1 is applied, so read the state written by version 1. C is shown as running slightly later, but really they're concurrent and their only ordering is who tries to commit first. B is applied first, creating version 2. Then when C tries to apply, it conflicts (marked by X), because its read of K (K?) was at version 1, but K has now been written at version 2. So C is retried, now reading the state at version 2 (written by B), and eventually applies at version 3.

The goal here is to produce the same results as if the transactions executed in sequence. So if 3 hadn't read K, but had read some other keys, we can execute it with a read version of 1 but we get the same result as if we'd had a read version of 2.

Deletions

Now we deal with deleting keys, written as !K, and here's the case that I was concerned about:

       1         2      3 
A [ !K ]         |      | 
B      |   [ K=1 ]      | 
C      |         | [ !K ] 
D      | [ a=K? K'=a?     ]

D is executed reading at version 1, checks if K exists, and writes this somewhere else in the KV. While its executing, other transactions create/modify K, maybe multiple times, and eventually delete it before D completes.

When D completes, does it need to conflict and retry, because K was modified while it was executing? If it does, then we need to store a tombstone for K in the KV, for at least as long as D is executing, so that when D comes to conflict we can compare "you depended on deleted-at-1" against "K is now deleted-at-3".

But I think this doesn't need to be a conflict. The result is equivalent to what would happen if D was executed on the state at version 3, because its decisions are based purely on K's existence/non-existence, which is the same in both cases. In other words, this concurrent execution has produced the same state as the linear sequence [A, B, C, D].

We only need have real positive Versions or NoVersion, and if we don't have the same value at the end as the start its a conflict. But we don't need to distinguish different kinds of NoVersion.

Thoughts?

If my reasoning here is correct, then we can make a few changes in the KV (reading a deletion is a dependency on kv::NoVersion, as though it never existed) and delete keys permanently when they're removed. Am I missing some other case?

eddyashton avatar Jan 29 '21 14:01 eddyashton

@eddyashton As discussed, I agree with the above. Another way to put it is that our current read-dependency conflict checker currently compares the version of the key in the Tx's read set compared to the most recent version of the key in the Map. However, if a concurrent Tx had written to the key but not actually modified the associated value (i.e. writing the value that was already there), our current conflict checker will raise a conflict, even though "our" Tx could successfully commit, as all dependent values haven't changed. However, this type of value conflict checker would be much more expensive (i.e. deserialise the value) than checking the version, so comparing the version is a good-enough and cheap approximation.

I see read-dependency conflict on a deleted key a special case of value checker conflict, except that it is much cheaper because the key doesn't exist in this case (no expensive value deserialisation). So I agree that it should be safe to only check to commit a Tx that read a key which didn't exist, even if concurrent Txs create and delete that key in the meantime.

jumaffre avatar Feb 18 '21 10:02 jumaffre