scylla-cluster-tests icon indicating copy to clipboard operation
scylla-cluster-tests copied to clipboard

`disrupt_modify_table` can fail on TWCS

Open fruch opened this issue 2 years ago • 2 comments

seems like some varinats of modify table are failing when done ontop of TWCS

2022-12-18 03:35:01.476: (DisruptionEvent Severity.ERROR) period_type=end event_id=dbbf318a-6e93-4345-829d-8925a1ff1736 duration=0s: nemesis_name=ModifyTable target_node=Node sct-cluster-us-east1-b-us-east1-0 [10.0.0.111 | 10.0.0.176] (seed: False) errors=<Error from server: code=2300 [Query invalid because of configuration issue] message="The setting of default_time_to_live=600017925 and compaction window=86400(s) can lead to 6944 windows, which is larger than the allowed number of windows specified by the twcs_max_window_count (50) parameter. Note that default_time_to_live=0 is also highly discouraged.">
Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 3831, in wrapper
result = method(*args[1:], **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2469, in disrupt_modify_table
disrupt_func()
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2286, in modify_table_default_time_to_live
self._modify_table_property(name="default_time_to_live", val=random.randint(864000, 630720000),
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 1786, in _modify_table_property
session.execute(cmd)
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/common.py", line 1551, in execute_verbose
return execute_orig(*args, **kwargs)
File "cassandra/cluster.py", line 2699, in cassandra.cluster.Session.execute
File "cassandra/cluster.py", line 5006, in cassandra.cluster.ResponseFuture.result
cassandra.protocol.ConfigurationException: <Error from server: code=2300 [Query invalid because of configuration issue] message="The setting of default_time_to_live=600017925 and compaction window=86400(s) can lead to 6944 windows, which is larger than the allowed number of windows specified by the twcs_max_window_count (50) parameter. Note that default_time_to_live=0 is also highly discouraged.">

maybe related to https://github.com/scylladb/scylla-cluster-tests/commit/98676bba2eb2ec9405d2a330a252433eb111ae83 that started doing modify TCWS tables ?

Installation details

Kernel Version: 5.4.219-126.411.amzn2.x86_64 Scylla version (or git commit hash): 5.2.0~dev-20221217.b52bd9ef6aa9 with build-id dfbc8f017962f75db1c809914edd78a594cd80d7

Operator Image: scylladb/scylla-operator:latest Operator Helm Version: v1.8.0-alpha.0-162-g7be1034 Operator Helm Repository: https://storage.googleapis.com/scylla-operator-charts/latest Cluster size: 4 nodes (i3.4xlarge)

Scylla Nodes used in this run: No resources left at the end of the run

OS / Image: `` (k8s-eks: eu-north-1)

Test: longevity-scylla-operator-basic-12h-eks Test id: 15a0edac-0089-43a5-a502-f6b5bde91d68 Test name: scylla-operator/operator-master/eks/longevity-scylla-operator-basic-12h-eks Test config file(s):

Issue description

>>>>>>> Your description here... <<<<<<<

  • Restore Monitor Stack command: $ hydra investigate show-monitor 15a0edac-0089-43a5-a502-f6b5bde91d68
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs 15a0edac-0089-43a5-a502-f6b5bde91d68

Logs:

Jenkins job URL

fruch avatar Dec 18 '22 14:12 fruch

Got it again:

Traceback (most recent call last):
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 5063, in wrapper
    result = method(*args[1:], **kwargs)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2787, in disrupt_modify_table
    disrupt_func()
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2602, in modify_table_default_time_to_live
    self._modify_table_property(name="default_time_to_live", val=random.randint(864000, 4300000),
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 1993, in _modify_table_property
    session.execute(cmd)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/common.py", line 1824, in execute_verbose
    return execute_orig(*args, **kwargs)
  File "cassandra/cluster.py", line 2699, in cassandra.cluster.Session.execute
  File "cassandra/cluster.py", line 5018, in cassandra.cluster.ResponseFuture.result
cassandra.protocol.ConfigurationException: <Error from server: code=2300 [Query invalid because of configuration issue] message="The setting of default_time_to_live=3682544 and compaction window=3600(s) can lead to 1022 windows, which is larger than the allowed number of windows specified by the twcs_max_window_count (50) parameter. Note that default_time_to_live=0 is also highly discouraged.">

Packages

Scylla version: 5.5.0~dev-20240209.7a710425f0e2 with build-id f4777533fdd4b53d42e7a22dd9605eb4feb4fc5a Kernel Version: 5.15.0-1053-aws

Installation details

Cluster size: 4 nodes (i3en.2xlarge)

Scylla Nodes used in this run:

  • longevity-twcs-48h-master-db-node-afcdf00b-8 (3.248.225.149 | 10.4.10.254) (shards: 7)
  • longevity-twcs-48h-master-db-node-afcdf00b-7 (3.255.167.128 | 10.4.8.105) (shards: -1)
  • longevity-twcs-48h-master-db-node-afcdf00b-6 (18.201.133.180 | 10.4.11.69) (shards: -1)
  • longevity-twcs-48h-master-db-node-afcdf00b-5 (34.241.107.131 | 10.4.9.87) (shards: -1)
  • longevity-twcs-48h-master-db-node-afcdf00b-4 (52.16.163.66 | 10.4.8.161) (shards: 7)
  • longevity-twcs-48h-master-db-node-afcdf00b-3 (54.217.137.119 | 10.4.10.110) (shards: 7)
  • longevity-twcs-48h-master-db-node-afcdf00b-2 (34.244.67.253 | 10.4.9.248) (shards: 7)
  • longevity-twcs-48h-master-db-node-afcdf00b-1 (3.253.51.25 | 10.4.10.155) (shards: 7)

OS / Image: ami-07569b2b3deaa1ea9 (aws: undefined_region)

Test: longevity-twcs-48h-test Test id: afcdf00b-2ebd-4914-8b13-651493354d53 Test name: scylla-master/longevity/longevity-twcs-48h-test Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor afcdf00b-2ebd-4914-8b13-651493354d53
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs afcdf00b-2ebd-4914-8b13-651493354d53

Logs:

Jenkins job URL Argus

enaydanov avatar Feb 13 '24 07:02 enaydanov

@aleksbykov @temichus

can you please attend to this one, I don't know what the calculation should be there, but seems like it's wrong (or doesn't fit all cases)

fruch avatar Feb 13 '24 08:02 fruch