orleans icon indicating copy to clipboard operation
orleans copied to clipboard

Cassandra Clustering implementation

Open rkargMsft opened this issue 1 year ago • 12 comments

Built off of https://github.com/OrleansContrib/OrleansCassandraUtils with the following major changes:

  • Target Orleans 8
  • Better multi-datacenter support
  • Allow multiple serviceId and clusterId to be stored in the same keyspace
  • Integration tests via Testcontainers
Microsoft Reviewers: Open in CodeFlow

rkargMsft avatar Mar 25 '24 18:03 rkargMsft

@dotnet-policy-service agree company="Microsoft"

rkargMsft avatar Mar 25 '24 18:03 rkargMsft

@ReubenBond Can this PR get flagged to run the integration tests? Specifically to validate that the new Cassandra tests:

"Category=${{ "Cassandra" }}&(Category=BVT|Category=SlowBVT|Category=Functional)"

rkargMsft avatar Apr 03 '24 15:04 rkargMsft

@rkargMsft done

ReubenBond avatar Apr 03 '24 15:04 ReubenBond

@ReubenBond Sorry, looks like this needs approval for each commit to run the tests again

rkargMsft avatar Apr 10 '24 00:04 rkargMsft

@ReubenBond One open question on a unit test pulled from MembershipTableTestBase that doesn't pass: https://github.com/dotnet/orleans/blob/84d06497d8663155fc2e969238ca90eb93f05f18/test/Extensions/Tester.Cassandra/Clustering/Cassandra.cs#L103-L157

This test expects to reject the insert of a membership entry if it already exists for the specific Silo. Cassandra doesn't have a way to support this: doing both a Light Weight Transaction (LWT) for the table version check and an Exists check for the entry table isn't something that it supports.

That said, it seems like it would require an incorrectly functioning Silo to even get in this situation. It would need to read the current membership table, see that an entry exists for this Silo, decide to insert a new entry anyway, increment to the correct New Table version, then insert the entry. No other Silo would be adding an entry that would conflict, so I think this would be something that wouldn't ever happen.

rkargMsft avatar Apr 10 '24 16:04 rkargMsft

@rwkarg that is very odd. Are you saying that INSERT ... IF NOT EXISTS is not supported under an LWT? They use this in their docs example: https://docs.datastax.com/en/cql-oss/3.3/cql/cql_using/useInsertLWT.html. I may be misunderstanding

ReubenBond avatar Apr 10 '24 17:04 ReubenBond

Can't do both an IF version = :expected_version; (for the table version check) and an IF NOT EXISTS (for the "insert only" check) in the same query.

rkargMsft avatar Apr 10 '24 17:04 rkargMsft

Can't do both an IF version = :expected_version; (for the table version check) and an IF NOT EXISTS (for the "insert only" check) in the same query.

@ReubenBond Found a better way to explain the situation. INSERT allows only for IF NOT EXISTS UPDATE allows only for IF column = :value (or IF EXISTS). There isn't a way to mix those. Generally that isn't an issue. INSERT is for adding a new entry so why would you do a conditional check there? UPDATE is for changing an existing row so why would you make an IF NOT EXISTS check there?

Where this comes up is that the membership table has rows for each Silo entry but there's a static column (basically one cell that all rows share) on those rows for the overall membership table version. This creates the situation where, to meet the expectations of the commented out unit test, we need to both:

  • only write the row if there isn't an entry with the same primary key (ie. only if there isn't an entry for the Silo already)
  • only write the row if the static column for the table version is the expected, last read, value (for the CAS check)

Cassandra doesn't directly support doing both of those in a query. The recommendations I've seen to work around this are to have the application logic do the "does this row exist" check based on querying the data and then make the write as an UPDATE with the CAS check to ensure that the view the application had when checking for row existence is still the current view when making the write.

MembershipTableManager already makes this application side check by:

  • Checking if the existing table has an entry https://github.com/dotnet/orleans/blob/50eba309c16c73cacbe6aad2c809a771b3faa38c/src/Orleans.Runtime/MembershipService/MembershipTableManager.cs#L410
  • Calling Insert or Update, as appropriate, with a CAS check https://github.com/dotnet/orleans/blob/50eba309c16c73cacbe6aad2c809a771b3faa38c/src/Orleans.Runtime/MembershipService/MembershipTableManager.cs#L431-L438

rkargMsft avatar Apr 11 '24 15:04 rkargMsft

Given we have a version check, can the Cassandra implementation fail the insert if it knows that the version row is there already? The version check should be all we need here, given we have lightweight transactions for consistency

The recommendations I've seen to work around this are to have the application logic do the "does this row exist" check based on querying the data and then make the write as an UPDATE with the CAS check to ensure that the view the application had when checking for row existence is still the current view when making the write.

Essentially, this. If MembershipTableManager is already performing the check, then the behavior checked by this test shouldn't be necessary.

ReubenBond avatar Apr 11 '24 18:04 ReubenBond

can the Cassandra implementation fail the insert if it knows that the version row is there already?

Yes, this Cassandra implementation does that version check and will fail the query if the version is mismatched (this is covered by other tests).

That was the only open question I had. Otherwise I believe this is good to go, aside from the two Azure Pipeline check failures.

rkargMsft avatar Apr 11 '24 19:04 rkargMsft

@ReubenBond Are the failing dotnet.orleans Azure Pipelines checks something that I need to address?

It looks like it's running on Windows and there aren't any Cassandra container images for Windows.

rkargMsft avatar Apr 25 '24 15:04 rkargMsft

@ReubenBond @benjaminpetit Can we get CassandraCSharpDriver vetted for inclusion in dotnet-public?

It's Apache 2 licensed: https://www.nuget.org/packages/CassandraCSharpDriver

rkargMsft avatar Jun 25 '24 14:06 rkargMsft

@rkargMsft The one remaining thing: I think we should re-enable the commented-out test and implement this logic within CassandraClusteringTable, even if it's duplicating the pattern which MembershipTableManager already implements (and will be a little bit slower due to added round-trips): image

ReubenBond avatar Jul 08 '24 17:07 ReubenBond

I pushed an update to re-enable that test & add per-row ETags for silo entries

ReubenBond avatar Jul 08 '24 19:07 ReubenBond