gemini
gemini copied to clipboard
validation stuck for hours, with no clear reason
Issue description
gemini get stuck in validation phase, looping over it again and again, with no clear reason to what was failing:
{"L":"INFO","T":"2022-12-04T04:33:06.750Z","N":"generator","M":"starting partition key generation loop"}
{"L":"INFO","T":"2022-12-04T05:08:02.229Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2022-12-04T05:08:02.237Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2022-12-04T05:08:02.238Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2022-12-04T05:08:02.238Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2022-12-04T05:08:02.239Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2022-12-04T05:08:02.240Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2022-12-04T05:08:02.240Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
eventfully ending in timing out in the SCT side
Installation details
Kernel Version: 5.15.0-1026-aws
Scylla version (or git commit hash): 5.2.0~dev-20221203.1a6bf2e9ca02
with build-id 68462c978ff36fe6852b5957b71920bc0246832e
Cluster size: 3 nodes (i3.large)
Scylla Nodes used in this run:
- gemini-with-nemesis-3h-normal-maste-oracle-db-node-7e1d63a0-1 (34.246.183.123 | 10.4.1.144) (shards: 2)
- gemini-with-nemesis-3h-normal-maste-db-node-7e1d63a0-4 (54.75.95.183 | 10.4.1.172) (shards: 2)
- gemini-with-nemesis-3h-normal-maste-db-node-7e1d63a0-3 (34.253.83.197 | 10.4.2.133) (shards: 2)
- gemini-with-nemesis-3h-normal-maste-db-node-7e1d63a0-2 (52.208.233.47 | 10.4.1.201) (shards: 2)
- gemini-with-nemesis-3h-normal-maste-db-node-7e1d63a0-1 (54.194.60.216 | 10.4.0.25) (shards: 2)
OS / Image: ami-0d0c1a60290b29aa9
(aws: eu-west-1)
Test: gemini-3h-with-nemesis-test
Test id: 7e1d63a0-dd49-4ed4-916e-c8d571b34bcc
Test name: scylla-master/gemini-/gemini-3h-with-nemesis-test
Test config file(s):
-
Restore Monitor Stack command:
$ hydra investigate show-monitor 7e1d63a0-dd49-4ed4-916e-c8d571b34bcc
-
Restore monitor on AWS instance using Jenkins job
-
Show all stored logs command:
$ hydra investigate show-logs 7e1d63a0-dd49-4ed4-916e-c8d571b34bcc
Logs:
- db-cluster-7e1d63a0.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/7e1d63a0-dd49-4ed4-916e-c8d571b34bcc/20221204_085528/db-cluster-7e1d63a0.tar.gz
- monitor-set-7e1d63a0.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/7e1d63a0-dd49-4ed4-916e-c8d571b34bcc/20221204_085528/monitor-set-7e1d63a0.tar.gz
- loader-set-7e1d63a0.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/7e1d63a0-dd49-4ed4-916e-c8d571b34bcc/20221204_085528/loader-set-7e1d63a0.tar.gz
- sct-runner-7e1d63a0.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/7e1d63a0-dd49-4ed4-916e-c8d571b34bcc/20221204_085528/sct-runner-7e1d63a0.tar.gz
- parallel-timelines-report-7e1d63a0.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/7e1d63a0-dd49-4ed4-916e-c8d571b34bcc/20221204_085528/parallel-timelines-report-7e1d63a0.tar.gz