cassandra
cassandra copied to clipboard
DSP-23003 Port batchlog_endpoint_strategy config
Port bug: DB-1367 - Add batchlog_endpoint_strategy config to improve batchlog endpoint selection
Jira https://datastax.jira.com/browse/DSP-23003
Compare to original DSE commit https://github.com/jasonstack/apollo/commit/29fecd1f49d2bb539da573786f93fd9b0ba205c4#diff-03e0ee2c01f2ca4d3b15d3a1fc4917d8737861c15d8c83cb73b352fe9da22d1a
This change set is different from that patch in that the changes take place in ReplicaPlans.java instead of batchlogManger.java to simplify future merge conflicts.
Adds original batchlog_endpoint_strategy options:
- random_remote (default)
- dynamic_remote
- dynamic
while also keeping compatibility with Allow to filter batchlog endpoints by preferring the local rack via existing perferLocal boolean parameter and also a new batchlog_endpoint_strategy: prefer_local
Confirmed failures are false alarms, see cassci bot comment https://github.com/datastax/cassandra/commit/09acaf9e026b657999cda4ca51cc1a4f6a2e20f0#commitcomment-140825091
@djatnieks thanks for the reivew, I appreciate the nits! I'll probably get back to this next week.
We've discussed trying to get this into OSS, but it seems complicated. They have an open ticket that hasn't had any activity in a year https://issues.apache.org/jira/browse/CASSANDRA-18120. The main hurdle is probably the fact that our solution uses DynamicSnitch and it's common practice to disable dynamic snitching, so it might not be a beneficial solution for OSS
LGTM.
A few minor concerns:
- the
dse.property prefix (i don't have an answer or concrete opinion to this though), see comment, - have we done multiplexing tests on this ?
- have you tried rebasing this patch to cassandra's trunk, and cassandra-5.0 ? (I don't see TCM touching any of this stuff, but i'd want to double-check that – it might influence how we/you want the patch done in earlier branches…)
Thanks @michaelsembwever! Planning on getting this ready for merge today.
I'll rebase, update CassandraRelevantProperties, rename the property, and double check it applies cleanly to cassandra's trunk, and cassandra-5.0.
I'm not aware of any multiplexing tests. With this being my first CC PR I'm not sure which tests I need to run before merging
FYI: you have some conflicts to fix in your branch as it has conflicts with the main branch
Pushed to address comments
- Rebased and resolved conflicts. PR was originally developed on
ds-trunkinstead ofvsearch... - Added
REQUIRED_BATCHLOG_REPLICA_COUNTtoCassandraRelevantProperties - and renamed the property from
dse.batchlog.required_replica_counttocassandra.batchlog.required_replica_count - renamed
BatchlogEndpointStrategy.dynamicSnitchtoBatchlogEndpointStrategy.useDynamicSnitchScores - tweaked
BatchlogEndpointStrategycomments inconfig.java
Latest run shows 1 new test failure https://jenkins-stargazer.aws.dsinternal.org/job/ds-cassandra-pr-gate/job/PR-1080/10/testReport/ Waiting for rerun to complete before merging https://jenkins-stargazer.aws.dsinternal.org/job/ds-cassandra-pr-gate/job/PR-1080/11/
Quality Gate passed
Issues
9 New issues
0 Accepted issues
Measures
0 Security Hotspots
84.2% Coverage on New Code
0.0% Duplication on New Code
Run 11 completed with no failures
Thanks Shayne!