Implement ShardManagerReplicaAware class align UCS and replica shards
Implement ShardManagerReplicaAware to align UCS and replica shards and thus limit the amount of sstables that are partially owned by replicas.
The most interesting details are in the IsolatedTokenAllocator#allocateTokens and the ShardManagerReplicaAware#computeBoundaries methods.
In the allocateTokens method, we take the current token metadata for a cluster, replace the snitch with one that does not gossip, and allocate new nodes until we satisfy the desired additionalSplits needed. By using the token allocation algorithm, high level split points naturally align with replica shards as new nodes are added.
In computeBoundaries, we allocate any tokens needed, then we split the space into even spans and find the nearest replica token boundaries.
@blambov - please take a look. This is my initial take at the ShardManagerNodeAware. I left several TODOs in the code as questions for you. I haven't had a chance to do any testing yet. I am looking to get general feedback on the direction and on my understanding of what is necessary here. Thanks.
@blambov - do you mind taking another look? Thanks!
Marking as ready for review. There is still work to be done, but this will let tests run.
Could we temporarily change DEFAULT_IS_NODE_AWARE to true to run tests with it?
Looks like we hit an UnsupportedOperationException in several tests. Checking to see if it's an issue.
java.lang.UnsupportedOperationException: Token type BytesToken does not support token allocation.
at org.apache.cassandra.dht.ByteOrderedPartitioner$BytesToken.size(ByteOrderedPartitioner.java:134)
at org.apache.cassandra.db.compaction.ShardManagerTokenAware$TokenAlignedShardTracker.rangeSpanned(ShardManagerTokenAware.java:285)
at org.apache.cassandra.db.compaction.ShardTracker.applyTokenSpaceCoverage(ShardTracker.java:78)
at org.apache.cassandra.db.compaction.ShardManagerTokenAware$TokenAlignedShardTracker.applyTokenSpaceCoverage(ShardManagerTokenAware.java:305)
at org.apache.cassandra.db.compaction.unified.ShardedMultiWriter.prepareToCommit(ShardedMultiWriter.java:256)
at org.apache.cassandra.db.ColumnFamilyStore$Flush.flushMemtable(ColumnFamilyStore.java:1406)
at org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1315)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
@blambov - at this point, the remaining test failures appear to be from disk boundaries attempting to combine with the token aware shard manager:
Caused by: java.lang.IllegalArgumentException: Cannot use node aware strategy with disk boundaries
at org.apache.cassandra.db.compaction.ShardManager.create(ShardManager.java:51)
at org.apache.cassandra.db.compaction.UnifiedCompactionStrategy.maybeUpdateSelector(UnifiedCompactionStrategy.java:400)
at org.apache.cassandra.db.compaction.UnifiedCompactionStrategy.createSSTableMultiWriter(UnifiedCompactionStrategy.java:351)
at org.apache.cassandra.db.compaction.UnifiedCompactionContainer.createSSTableMultiWriter(UnifiedCompactionContainer.java:327)
at org.apache.cassandra.db.ColumnFamilyStore.createSSTableMultiWriter(ColumnFamilyStore.java:735)
at org.apache.cassandra.db.ColumnFamilyStore.createSSTableMultiWriter(ColumnFamilyStore.java:730)
at org.apache.cassandra.db.memtable.Flushing.createFlushWriter(Flushing.java:303)
at org.apache.cassandra.db.memtable.Flushing.flushRunnable(Flushing.java:135)
at org.apache.cassandra.db.memtable.Flushing.flushRunnables(Flushing.java:96)
at org.apache.cassandra.db.memtable.Flushing.flushRunnables(Flushing.java:73)
at org.apache.cassandra.db.ColumnFamilyStore$Flush.flushMemtable(ColumnFamilyStore.java:1360)
at org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1315)
I think https://github.com/datastax/cassandra/pull/1255/commits/073452c45f4503f0971ac4b262dcb46de007e946 was a legitimate issue but https://github.com/datastax/cassandra/pull/1255/commits/43624fc6a385c4f70a368026530662f4f11ec65b seems a bit more questionable.
What are your thoughts?
java.lang.UnsupportedOperationException: Token type BytesToken does not support token allocation. normally means the test is using ByteOrderedPartitioner. Since token allocation cannot work with it either, there's no point to try to fix the tests for that.
Actually we should not instantiate ShardManagerTokenAware if the partitioner for the table does not support splitting/sizing (!partitioner.splitter().isPresent()), and probably push its selection to after the test-specific adjustments here.
If we do the above, neither of the two fixes above should be needed.
@blambov - at this rate, UnifiedCompactionDensitiesTest is the only remaining compaction test that fails. Because I parameterized ShardedMultiWriterTest and added specific assertions about the distributions adding to about 1 and about the max number of tokens spanned, I think we have the spirit of the UnifiedCompactionDensitiesTest covered. I am going to disable DEFAULT_IS_NODE_AWARE now.
Quality Gate passed
Issues
5 New issues
0 Accepted issues
Measures
0 Security Hotspots
86.6% Coverage on New Code
0.0% Duplication on New Code