venice icon indicating copy to clipboard operation
venice copied to clipboard

[Specs] Add TLA+ Specification for a Bounded Collection Operation

Open ZacAttack opened this issue 1 year ago • 2 comments

What Is this?

This is a TLA+ specification which explores the concept of a BoundedCollection type as a CRDT.

Background

Today in Venice we support CRDT based operations on collections for write compute. These operations include the ability to add to a collection, remove from a collection, and set a collection to a value (setting it to either blank or a collection of elements as a full update). An important property of this data type is that these operations are commutative, and can be replayed in any order and come to the same consistent result. This is important for AA where two data centers may apply the same operations in any order.

One problem we currently suffer from in Venice though is that there is no bound to the size of a collection field in Venice. Since writers are working on an entry at a time, we have situations where this record can grow to a very large size, leading to expensive update and read operations in the database.

This spec introduces the notion of a bounded collection which has a max size and retains the top N elements.

For a bounded collection, it would be ideal for this collection to have the same operations available to an unbounded collection, with the addition of the max collection size constraint. This spec explores that idea, and makes the following property plain:

  • Adding and removing elements with a tombstone list of an unbounded collection is commutative.
  • Adding and removing elements with a tombstone list and a max size is NOT commutative.

The spec as published in this PR fails model checking. It can pass model checking with some tweaks to the cfg file (setting MAX_SIZE to less then zero removes the bound and makes it pass, also having no DELETE operations makes the checker pass).

The scenario flagged by the model checker in this PR looks like the following:

Say that we have the following operations:

INSERTS:

  • <<1, 1>>, <<2, 2>>, <<3, 3>>

DELETES:

  • <<3,4>>

MAX_SIZE:

  • 2

Operations are structured with <<value, timestamp>> and a type (insert, delete).

If we followed the natural order of these entries (by timestamp) the states of database would be:

t0: {} <-- INSERT<<1,1>> t1: {<<1, 1>>} <-- INSERT<<2,2>> t2: {<<1,1>>, <<2,2>>} <-- INSERT<<3,3>> t3: {<<2,2>>, <<3,3>>} <-- DELETE<<3,4>> t4: {<<2,2>>}

Following the natural order of events, we have a final result of {<<2,2>>}. Let's see what happens when switch up the order a bit. Let's have the the insert for three and the delete for three come first. Now this looks like:

t0: {} <-- INSERT<<3,3>> t1: {<<3, 3>>} <-- DELETE<<3,4>> t2: {} <-- INSERT<<1,1>> t3: {<<1,1>>} <-- INSERT<<2,2>> t4: {<<1,1>>, <<2,2>>}

We now have a final result of {<<1,1>>,<<2,2>>}. Which is different from the earlier result. Therefore these operations are no longer commutative!

The reason why this is happening appears to be because we have introduced a new reachable state in the state machine that implicitly adds an extra operation. This extra operation is a delete that occurs on the set once the set reaches it's max size. In the second example, this state was never reached. This causes both examples to have a different set of operations and then diverge.

There it would seem to be a few interesting approaches to avoid this, but I will leave it at that for now!

Summary, imperative, start upper case, don't end with a period

Resolves #XXX

How was this PR tested?

Does this PR introduce any user-facing changes?

  • [ ] No. You can skip the rest of this section.
  • [ ] Yes. Make sure to explain your proposed changes and call out the behavior change.

ZacAttack avatar Jul 12 '24 20:07 ZacAttack

Interesting idea, and seemingly useful as well. I think the way we would implement this would likely be to have some sort of annotation in the schema which defines the collection as having a special "bounded" type. Therefore, since this property is defined within the value schema, we could generate a different derived WC schema. In particular, we could drop support for item removals (and likely full updates as well), and only support additions.

FelixGV avatar Jul 12 '24 23:07 FelixGV

Interesting idea, and seemingly useful as well. I think the way we would implement this would likely be to have some sort of annotation in the schema which defines the collection as having a special "bounded" type. Therefore, since this property is defined within the value schema, we could generate a different derived WC schema. In particular, we could drop support for item removals (and likely full updates as well), and only support additions.

This is my thinking as well.

Regarding sets, set's seem to work, but maybe are actually confusing to implement? (we'd have to screen at producer or in the server elements if the put for that collection exceeds the configured bound, and that might be a bit awkward?).

Additionally, there are 'some' ways we could potentially support deletes, but they're... not the best imo? At least the ways I've though of. My (bad) ideas so far include:

  • Transmit the delete from leader when we need to shrink the array: This would work and make things consistent, but kind of sketchy. The leader would have to produce to RT in the ingestion task for this case, and that'd kick up the latency. Not to mention if an array is pretty chatty, we might end up producing quite a lot of records.
  • Only shrink the array when we push: We could shrink the array as part of a compaction repush in KIF, however, this only works if we only do pushes from a single colo, pushing from individual colos still causes the above inconsistency. This might be ok given that this issue only applies for AA stores? But it has the downside of leaving the cluster in a potential metastable state until the next push happens (if we have a lot of big rows due to overflowing arrays between pushes).
  • Split the array size between regions: We have the commutative requirement because of the async ordering from consuming on N rt's where N is the number of colos. However, writes from a single rt are strictly ordered. So if you split the array N ways and did the size constraint, then it would be consistent. The downside though is that this is probably very hard to reason about for the user (the array would keep the top results from each colo?).

Other interesting thought about the schema annotation. We could also probably add custom comparators if we had a good annotation syntax. For example, we could have bounded collections which kept the most cosineSimilary(key_field) in the array (since the key doesn't change relative to the field). I'm only speculating that that would be useful, I don't actually know.

ZacAttack avatar Jul 13 '24 03:07 ZacAttack