gemini icon indicating copy to clipboard operation
gemini copied to clipboard

validation stuck for hours, with no clear reason

Open fruch opened this issue 2 years ago • 0 comments

Issue description

gemini get stuck in validation phase, looping over it again and again, with no clear reason to what was failing:

{"L":"INFO","T":"2022-12-04T04:33:06.750Z","N":"generator","M":"starting partition key generation loop"}
{"L":"INFO","T":"2022-12-04T05:08:02.229Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2022-12-04T05:08:02.237Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2022-12-04T05:08:02.238Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2022-12-04T05:08:02.238Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2022-12-04T05:08:02.239Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2022-12-04T05:08:02.240Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}
{"L":"INFO","T":"2022-12-04T05:08:02.240Z","N":"validation_job","M":"validation failed for possible async operation","trying_again_in":"500ms"}

eventfully ending in timing out in the SCT side

Installation details

Kernel Version: 5.15.0-1026-aws Scylla version (or git commit hash): 5.2.0~dev-20221203.1a6bf2e9ca02 with build-id 68462c978ff36fe6852b5957b71920bc0246832e

Cluster size: 3 nodes (i3.large)

Scylla Nodes used in this run:

  • gemini-with-nemesis-3h-normal-maste-oracle-db-node-7e1d63a0-1 (34.246.183.123 | 10.4.1.144) (shards: 2)
  • gemini-with-nemesis-3h-normal-maste-db-node-7e1d63a0-4 (54.75.95.183 | 10.4.1.172) (shards: 2)
  • gemini-with-nemesis-3h-normal-maste-db-node-7e1d63a0-3 (34.253.83.197 | 10.4.2.133) (shards: 2)
  • gemini-with-nemesis-3h-normal-maste-db-node-7e1d63a0-2 (52.208.233.47 | 10.4.1.201) (shards: 2)
  • gemini-with-nemesis-3h-normal-maste-db-node-7e1d63a0-1 (54.194.60.216 | 10.4.0.25) (shards: 2)

OS / Image: ami-0d0c1a60290b29aa9 (aws: eu-west-1)

Test: gemini-3h-with-nemesis-test Test id: 7e1d63a0-dd49-4ed4-916e-c8d571b34bcc Test name: scylla-master/gemini-/gemini-3h-with-nemesis-test Test config file(s):

  • gemini-3h-with-nemesis.yaml

  • Restore Monitor Stack command: $ hydra investigate show-monitor 7e1d63a0-dd49-4ed4-916e-c8d571b34bcc

  • Restore monitor on AWS instance using Jenkins job

  • Show all stored logs command: $ hydra investigate show-logs 7e1d63a0-dd49-4ed4-916e-c8d571b34bcc

Logs:

Jenkins job URL

fruch avatar Dec 04 '22 09:12 fruch