scylla-cluster-tests
scylla-cluster-tests copied to clipboard
Feature zero token nodes
New feature Zero token nodes - were presented by https://github.com/scylladb/scylladb/pull/19684. These nodes are normal scylla node but they don't joined to token ring of cluster and don't contain any user data. They are used by raft and group0 to preserve majority for raft operations
Zero token nodes could be created only upon bootstrap if scylla.yaml contains join_ring: false.
Zero token nodes are normal scylla nodes and all operations are allowed with them (remove, decommission, replace with zero token nodes only)
the main goal of them:
- preserve quorum and majority in multi dc config. ex dc1: 3 : dc2: 3 - if one dc die or dc become isolated, both dc will lost majority and no any topology or schema operation will be allowed, if will be add one more dc with 1 zero token node dc3: 1, then majority will be saved and
- because zero token nodes stay as group0 voters, even single dc cluster: 4 nodes, +1 zero token nodes could save cluster majority and even if 2 nodes are die, schema changes and topology operations will be allowed.
zero token node doesn't contain user data, therefore the instance type could be smaller Drivers don't see zerotoken nodes and ignore them.
To add support zero token nodes in sct:
-
[ ] Add special flag to Node class which will true if node is zero token
-
[ ] Add new sct_config parameter to - number of zero token nodes - add additional dc with zero token nodes only
-
[ ] init cluster with zero token nodes: here we have 2 ways:
-
- [ ] using new parameter, mark latest
num_zero_tokenin list as zero token nodes This solution has several benefits: - don't need to fix provisioner code, all backends will be supported (aws, gce, docker, k8s??)
- if num_zero_tokens = 0, nothing will be changed
- simple to implement
- just add new test with correct yaml configuration and disadvantages:
- [ ] using new parameter, mark latest
- zero token nodes will have same instance type as normal db nodes - in case test will not have a lot of them, it will not be a problem
- if enable by default even 1 zero token node in config, all config yaml files should be reviewed so number of normal nodes was relevant to RF of schemas, because if test is conifgured with 3 nodes and one of them 1 zero token node and rf =3, after provision, cluster will have 2 normal nodes and 1 zero token nodes. This could affect on c-s and other stress tools.
- [ ] - Add code to Provisioner (aws, azure) and to gce, docker, k8s?? and provision correct number of nodes n_db_nodes + nemesis_add_nodes+num_zero_token_nodes. Also add new sct config parameter which allow to set small instance type for zero token nodes this solution has next benefits:
- instance type could be set independently for normal nodes and zero token nodes.
- No need to review and fix all test config yamls Disadvantages:
- quite complex code refactor: need to fix Provisioner and all related stuff, add new code for gce, docker (we don't have provisioner for them yet, am i right?)
- Add additional checks for instance type, so not set small instance as normal node
- others, not yet found them )
-
-
[ ] Add support for cluster operation and Disrupt Nemesis:
- bootstrap new node as zero token
- if decommission zero token node and new zero token nodes
- replace zero token node with zero token node only
- remove zero token node and if after that add new node it should be zero token node
-
[ ] Cluster status and health Currently node tool status doesn't display zero-token node https://github.com/scylladb/scylladb/issues/19849. Once it will be fixed, all zero token nodes will be displayed in output as usual nodes. But before that, it is need work around so cluster health and nodetool status correctly work and don't fail the test.
-
[ ] New nemesis:
- [ ] Create new DC with zero token node only (could be improved current one)
- [ ] Multi DC (2 dc) create new DC with zero-token node only and stop all nodes in 1 dc, check that majority is not lost ( another variations is 3 dc initiated, and nemesis check that we have 3 DCes and 1 dc has only zerotoken node and only then run, otherwise it skipped)
Testing
The pr is already contains first way cluster initialization. Job done. Node with zero token is added. But due to initializtion https://github.com/scylladb/scylladb/issues/19849, cluster health is failed and job marked as failed.
PR pre-checks (self review)
- [ ] I added the relevant
backportlabels - [ ] I didn't leave commented-out/debugging code
Reminders
- Add New configuration option and document them (in
sdcm/sct_config.py) - Add unit tests to cover my changes (under
unit-test/folder) - Update the Readme/doc folder relevant to this change (if needed)
@soyacz @fruch @roydahan @vponomaryov @temichus need your opinion for implementation.
New changes are ready:
- Removed unused sct_config variables. Now only 2 variables are used: n_db_zero_token_nodes and zero_token_instance_type_db. n_db_zero_token_nodes has same format as n_db_nodes: int or "0 0 1" which means that we should have 3 dc, and znode will be only in 3rd one. zero_token_instance_type_db - instance type of znodes. if it is not set, then db_instance_type will be used.
- Add to sct aws provisioner functionality to create instances for znodes with specified instnace type. znode instance will have tag ZeroTokenNode=True, which to detect the znode by sct and correctly configure it in scylla.yaml
- sct (aws part only) - can detect znode and mark them with new property BaseNode._zero_token_node. Also we can create znode with aws cluster(added appropriate parameter to add_nodes() method
- BaseScyllaCluster has 2 new properties: data_node and zero_nodes. Each return list for relevant nodes. BaseScyllaClluster.nodes was not renamed. Because it require a lot of changes in sct code. BaseScyllaClluster.nodes contains full list of nodes(datanodes and znodes). This is also done, because znode are regular scylla node(without data) and most of the operations could be start from them
- Nemsis.get_target_node switched for now to get target node only from datanode pool.
- Update all Nemesis and sct operation to use specified pool of nodes is in progress
- Added 2 new nemesises:
- Bootstrap/Decommission znode to cluster (singleDC and MultiDC)
- Add new remove new dc with znode only (singleDC), For multiDC is in progress
- Add workaround for health_validator and wait_node_up to get status of nodes from system.cluster_status table instead of
nodetool status - Added 2 new jobs single dc and multidc with znode configured.
- both job have passed with regular nemesis
I would recommend adding issues for follow on other backends, we can help with docker/gce/azure, and cloud/operator team with k8s backend, they would need it as well at some point.
Fixed according comments. New:
- Added new global flag 'use_zero_nodes' which allow to enabled/disable support of zero nodes in sct. If it is disabled, sct will configure and use only data nodes, all nemesis related to zero nodes only will be skipped, all nemesis which could work with any nodes will work only with data_nodes.
- Nemesis Class has new attribute 'target_node_pool' and method set_target_node_pool. This allow to explicitly configure the pool of nodes for choosing target node for nemesis. Several nemesis marked with decorator, which explicitly set target_node_pool. by default data_nodes set as target_node_pool
- Mark several disrupt_* method to use appropriate pool
- Go through sct code, and set to use data_nodes instead of all nodes (self.nodes).
- Add appropirate flag to zero node related yaml and jobs.
- 2 testing jobs are added and will be used for testing zero nodes, all other nodes will continue to work without zero nodes for a while.
- Added new nemesis flag 'zero_node_changes' which allow to select nemesis which could work with zero nodes.
@patjed41 @soyacz , can you review again?
@soyacz , fixed by your comments. Can you take a look
I took a look at risky places (possibly not all of them) and got to this:
to fix:
sdcm.cluster.BaseCluster.run_node_benchmarks- should be run on data nodes
Fixed
sdcm.cluster.BaseScyllaCluster.get_rack_nodes- used in k8s context, replaces db node with data node.
i set it as is, because k8s is not support yet. I thinks this better to fix when k8s support with zero nodes will be added Now for K8S self.nodes == self.data_nodes
longevity_test.LongevityTest.test_custom_time- in case ofcluster_target_sizeis set, it shouldn't take into account zero nodes I believe
Set to use data_nodes for counting
- places where we use
adaptive_timeout- should operate on data nodes. fix places where we used cluster.nodes[0]
fixed
longevity_test.LongevityTest._run_validate_large_collections_in_systemandlongevity_test.LongevityTest._run_validate_large_collections_warning_in_logsshould operate on data nodes I think
Fixed.
longevity_test.LongevityTest.all_node_ips_for_stress_command- also should operate on data nodes I think
zero nodes could be query coordinator, and driver filter them out, but set to data_nodes for safety
longevity_tombstone_gc_test.TombstoneGcLongevityTest._run_repair_and_major_compaction- please verify
switch to data_nodes
sdcm.cluster.BaseScyllaCluster.start_kms_key_rotation_threadshould be adjusted to use data nodes I think
it is related to scylla config, so it is better to run on all nodes
sdcm.nemesis.Nemesis.disrupt_load_and_stream- verify
Fixed. set to use data nodes explicitly
SstableLoadUtils.distribute_test_files_to_cluster_nodes
It uses the passed node list, so fix above should also fix this one
disrupt_nodetool_refreshinshards_num = self.cluster.nodes[0].scylla_shardscan take zero node and return wrong value, also I don't think this disruption should upload sstables to all nodes
fixed
disrupt_nodetool_enospc- this nemesis might not work with zeros, samedisrupt_end_of_quota_nemesis
set data_nodes
check_that_sstables_are_encrypted- this will fail
Set to use only data nodes. it should work. i will tirgger job to check it
_switch_to_network_replication_strategy-nodes_by_regiontake Zero nodes into account
set to use data nodes
disrupt_add_remove_dc- also needs fixing in couple places
updated with data nodes
disrupt_add_remove_mv- should use data nodes as targed I think (btw. willsdcm.utils.nemesis_utils.indexes.wait_for_view_to_be_builtwork?)
yes, it should work
sdcm/nemesis.py:5302- nemesis wrapper - we verifyif num_nodes_before != num_nodes_after:this should verify if we have the same number of zero nodes and data nodes I think
added checks for both
sdcm.scan_operation_thread.FullscanOperationBasepossibly also should base on data nodes
yep, fixed
sdcm.tombstone_gc_verification_thread.TombstoneGcVerificationThread- also should work with data nodes (validates sstables)
set to data_nodes
questions:
- Generally, shouldn't all db cql sessions be created with data nodes?
no, zero node could be query coordinator and it send query request to required data node
- Shouldn't all stress threads base on data nodes too?
Same as above. Dirvers will automatically ignore zero nodes even if they set in -nodes. The only limitation is for multidc, for DC with zero nodes only should be explicit set replication factor: 0. Auto resize rf is not yet working. Issue was opened for that
sdcm.cluster.BaseScyllaCluster.set_seeds- can zero node be seed?
yes. The only limitation for zero node, it have not to be first node incluster, but in sct zero nodes added in last order
sdcm.cluster.BaseScyllaCluster.wait_for_schema_agreement- is schema agreement check same on zero nodes?
yes, schema should be same on all nodes. It is supported by raft
sdcm.cluster.BaseScyllaCluster.wait_all_nodes_unusescheck_nodes_up_and_normalwhich bases on nodetool status, will it work?
i added workaround while issue is not fixed. Status will be checked from system tables and generate same dict as nodetool status
sdcm.cluster.BaseScyllaCluster.verify_decommissionalso uses nodetool status under the hood
same
sdcm.cluster.BaseScyllaCluster.get_db_nodes_cpu_mode- is perftune yaml always present on zero nodes?
for now zero nodes always run on supported by scylla instances, and perftune is present. in next we could changed.
sdcm/commit_log_check_thread.py- verify how zero nodes should be counted here, possibly the same, but I'm not sure.
Looks same.
- Do we need scylla manager agent on zero nodes?
issue opened for scylla manger zero node support. But i think yes
- Do we need special treatment of zero nodes in upgrade tests? - support for upgrade 1. test can be done in followup task.
it will be next tasks. From upgrade perspective noting should be changed, and zero node upgraded as usual
- Does nodetool cleanup work on Zeros?
yes, finished very fast
- Won't
disrupt_validate_hh_short_downtimefail when validating un with nodetool on all nodes
workaround should fixed it
- Won't
sdcm.cluster.BaseScyllaCluster.get_node_status_dictionaryfail if Zero node is used?
workaround should help
sdcm.utils.compaction_ops.CompactionOps- should work with data nodes
switched
to check:
sdcm.cluster.BaseScyllaCluster.get_node- it is used for nosqlbench as target and other places, see if shouldn't pin to data nodes
Set to use data_nodes
sdcm.kafka.kafka_cluster.LocalKafkaCluster- doesn't it break kafka connector?
it shouldn't.
- Validate SLA tests (possibly another followup task)
Yes it will be another task, for now Enterprise 2024.2 is not support zero nodes yet
- Same alternator tests (possibly another followup task)
Yes it will be check with another task by enabling flag for appropriate job
sdcm.cluster.BaseScyllaCluster.start_kms_key_rotation_threadshould be adjusted to use data nodes I thinkit is related to scylla config, so it is better to run on all nodes
but inside it verifies sstables of user keyspaces - this will fail. This part needs to be adjusted.
to check:
sdcm.cluster.BaseScyllaCluster.get_node- it is used for nosqlbench as target and other places, see if shouldn't pin to data nodesSet to use data_nodes
In that case, if drivers ignore zero nodes, this change is not required
sdcm.kafka.kafka_cluster.LocalKafkaCluster- doesn't it break kafka connector?it shouldn't.
Worth to schedule a test for it (can be done later, and raise scylla issue if it failed)
More notes:
I didn't verify if you indeed fixed all my noted places - too hard, so make sure you pushed all changes.
see where we use nodetool status and adjust places use it - I might miss something.
Also you can also double check all the places where cluster.nodes or self.nodes is used - I did it, but could miss some places as it's quite mundane work.
Let us know when you finish it and is ready for merge. Also rerun all the tests after recent updates. Include 1-2 artifact tests if don't break as this is showstopper for Scylla master and we don't want to affect that (they run on GCE, so it will verify if something required was missing there). Run also one Azure test (can be artifact too).
sdcm.cluster.BaseScyllaCluster.start_kms_key_rotation_threadshould be adjusted to use data nodes I thinkit is related to scylla config, so it is better to run on all nodes
but inside it verifies sstables of user keyspaces - this will fail. This part needs to be adjusted.
Yep, i found what you mean. Fixed to use data nodes
to check:
sdcm.cluster.BaseScyllaCluster.get_node- it is used for nosqlbench as target and other places, see if shouldn't pin to data nodesSet to use data_nodes
In that case, if drivers ignore zero nodes, this change is not required
reverted.
sdcm.kafka.kafka_cluster.LocalKafkaCluster- doesn't it break kafka connector?it shouldn't.
Worth to schedule a test for it (can be done later, and raise scylla issue if it failed)
Add it to plan. Can you point me to kafka job?
More notes: I didn't verify if you indeed fixed all my noted places - too hard, so make sure you pushed all changes. see where we use
nodetool statusand adjust places use it - I might miss something. Also you can also double check all the places wherecluster.nodesorself.nodesis used - I did it, but could miss some places as it's quite mundane work.
i ve done, but also could miss something. anyway, for now only several specific jobs will be run with enabled zero tokens, By default they will be disabled and most of the jobs should work as expected. i will run all jobs with enabling zero tokens one by one to minimize possible crashes
Let us know when you finish it and is ready for merge. Also rerun all the tesits after recent updates. Include 1-2 artifact tests if don't break as this is showstopper for Scylla master and we don't want to affect that (they run on GCE, so it will verify if something required was missing there). Run also one Azure test (can be artifact too).
i will update job results here:
sdcm.kafka.kafka_cluster.LocalKafkaCluster- doesn't it break kafka connector?it shouldn't.
Worth to schedule a test for it (can be done later, and raise scylla issue if it failed)
Add it to plan. Can you point me to kafka job?
https://jenkins.scylladb.com/view/master/job/scylla-master/job/kafka_connectors/ , but speak with @dimakr as recently new tests were added but I think we miss still jenkins jobs for it.
More notes: I didn't verify if you indeed fixed all my noted places - too hard, so make sure you pushed all changes. see where we use
nodetool statusand adjust places use it - I might miss something. Also you can also double check all the places wherecluster.nodesorself.nodesis used - I did it, but could miss some places as it's quite mundane work.i ve done, but also could miss something. anyway, for now only several specific jobs will be run with enabled zero tokens, By default they will be disabled and most of the jobs should work as expected. i will run all jobs with enabling zero tokens one by one to minimize possible crashes
Sure, just minimizing number of problems - we have many nemesis and better to pass as many as possible in the first run.
https://jenkins.scylladb.com/view/master/job/scylla-master/job/kafka_connectors/ , but speak with @dimakr as recently new tests were added but I think we miss still jenkins jobs for it.
Right, we have jobs for kafka source connector (scylla > kafka direction of changes replication). The job for kafka sink connector (kafka > scylla direction) is not there yet, but the test itself can be easily triggered using the existing longevity jobs and kafka sink connector specific test config.
All jobs run with disabled zero nodes supports but with new nodes poool: .nodes = all nodes .data_nodes = only data nods .zero_nodes = only zero nodes
and new nemesis target_node_pool which is set exlicitly to use data_nodes as target
ArtifactsJobs:
- Passed - https://jenkins.scylladb.com/job/scylla-staging/job/abykov/job/artifacts-ubuntu-2204-test/7
- Passed - https://jenkins.scylladb.com/job/scylla-staging/job/abykov/job/artifacts-azure-image-test/8
- FAILED - https://jenkins.scylladb.com/job/scylla-staging/job/abykov/job/artifacts-amazon2-test/2 - Failed with same error as in master
Invidual Nemesis, Passed (from PR point of view): Job marked as failed. But errors found in teardown. no errors in code or nemesis related to PR changes:
- https://jenkins.scylladb.com/view/master/job/scylla-master/job/abykov-nemesis-individuals/job/longevity-5gb-1h-AbortRepairMonkey-aws-test/
- https://jenkins.scylladb.com/view/master/job/scylla-master/job/abykov-nemesis-individuals/job/longevity-5gb-1h-AddDropColumnMonkey-aws-test/
- https://jenkins.scylladb.com/view/master/job/scylla-master/job/abykov-nemesis-individuals/job/longevity-5gb-1h-AddRemoveDcNemesis-aws-test/
- https://jenkins.scylladb.com/view/master/job/scylla-master/job/abykov-nemesis-individuals/job/longevity-5gb-1h-AddRemoveMvNemesis-aws-test/
- https://jenkins.scylladb.com/view/master/job/scylla-master/job/abykov-nemesis-individuals/job/longevity-5gb-1h-ClusterRollingRestart-aws-test/
Regular jobs:
- https://jenkins.scylladb.com/job/scylla-staging/job/abykov/job/longevity-3h-azure-test/5 (failed by spot termination) nemesis passed, no code errors
- https://jenkins.scylladb.com/job/scylla-staging/job/abykov/job/longevity-3h-10gb-gce-test/6 (failed with cs errors on teardown) nemesis are passed, no code errors
- https://jenkins.scylladb.com/job/scylla-staging/job/abykov/job/longevity-50gb-3days-test/5 (failed but not related to pr changes)
Jobs with zero nodes support enabled and nemesis which run with zero nodes and :
-
https://jenkins.scylladb.com/job/scylla-staging/job/abykov/job/longevity-multi-dc-rack-aware-zero-token-dc/37 - Job is passed. Found only log errors and c-s errors
-
https://jenkins.scylladb.com/job/scylla-staging/job/abykov/job/longevity-4h-100gb-zero-nodes-larger-than-data-nodes/2 - Passed
-
https://jenkins.scylladb.com/job/scylla-staging/job/abykov/job/longevity-100gb-4h-zero-token-nodes/87/
@soyacz can you take a look
@soyacz , i removed workaround. i think PR is ready to be merged, and it have chance to run on weekend for all jobs. ( if will be something broken, to be reverted on Monday )