scylla-cluster-tests offline-install/artifacts-centos8-test fail on timeout connecting to GCE nodes

Seen on both #487 and #485 - it's not clear from the console what the root cause is - if it's just about increasing the timeout or if we have a real issue

08:59:50  /usr/local/lib/python3.10/site-packages/elasticsearch/connection/http_urllib3.py:209: UserWarning: Connecting to https://746f0ad652a3447d83b1572f657c67cb.us-east-1.aws.found.io:9243/ using SSL with verify_certs=False is insecure.
08:59:50    warnings.warn(
08:59:50  < t:2024-01-04 06:59:49,696 f:cluster_gce.py  l:499  c:sdcm.cluster         p:INFO  > GCE Cluster artifacts-centos8-jenkins-db-cluster-53d2c071 | Image: centos-stream-8 | Root Disk: pd-ssd 50 GB | Local SSD: 1 | Type: n1-standard-2: Adding nodes to cluster
09:00:44  < t:2024-01-04 07:00:38,378 f:cluster.py      l:1005 c:sdcm.cluster_gce     p:INFO  > Node artifacts-centos8-jenkins-db-node-53d2c071-0-1 [34.74.116.73 | 10.142.0.38] (seed: False): node_type = db
09:00:44  < t:2024-01-04 07:00:39,789 f:cluster_gce.py  l:515  c:sdcm.cluster         p:INFO  > GCE Cluster artifacts-centos8-jenkins-db-cluster-53d2c071 | Image: centos-stream-8 | Root Disk: pd-ssd 50 GB | Local SSD: 1 | Type: n1-standard-2: Added node: artifacts-centos8-jenkins-db-node-53d2c071-0-1
09:00:44  < t:2024-01-04 07:00:39,790 f:cluster_gce.py  l:519  c:sdcm.cluster         p:INFO  > GCE Cluster artifacts-centos8-jenkins-db-cluster-53d2c071 | Image: centos-stream-8 | Root Disk: pd-ssd 50 GB | Local SSD: 1 | Type: n1-standard-2: added nodes: [<sdcm.cluster_gce.GCENode object at 0x7f7be9c2b100>]
09:00:44  < t:2024-01-04 07:00:39,791 f:cluster.py      l:3098 c:sdcm.cluster         p:INFO  > GCE Cluster artifacts-centos8-jenkins-loader-set-53d2c071 | Image: centos-7 | Root Disk: pd-standard None GB | Type: e2-standard-2: Init nodes
09:00:44 
 < t:2024-01-04 07:00:41,052 f:cluster.py      l:514  c:sdcm.cluster_gce     p:INFO  > Node artifacts-centos8-jenkins-db-node-53d2c071-0-1 [34.74.116.73 | 10.142.0.38] (seed: True): Detected Linux distribution: CENTOS8
09:28:57  Cancelling nested steps due to timeout
09:28:57  Sending interrupt signal to process
09:29:12  Terminated
09:29:13  script returned exit code 143
09:29:13  [Pipeline] }

Jan 04 '24 08:01 benipeled

If there are not enough logs, one needs to run it with keep and take a look

Jan 04 '24 08:01 fruch

@benipeled have you seen this happening again ?

Mar 04 '24 16:03 fruch

I'm not sure it's the same case but this one looks similar - https://jenkins.scylladb.com/job/scylla-master/job/artifacts-offline-install/job/artifacts-centos8-test/538/

01:08:06  < t:2024-02-22 23:08:03,544 f:cluster.py      l:507  c:sdcm.cluster_gce     p:INFO  > Node artifacts-centos8-jenkins-db-node-92bfa5c8-0-1 [34.23.71.148 | 10.142.0.56] (seed: True): Detected Linux distribution: CENTOS8
01:24:23  < t:2024-02-22 23:24:16,237 f:logcollector.py l:839  c:sdcm.logcollector    p:INFO  > Saving kallsyms map from host: artifacts-centos8-jenkins-db-node-92bfa5c8-0-1

Mar 04 '24 16:03 benipeled

I'm not sure it's the same case but this one looks similar - https://jenkins.scylladb.com/job/scylla-master/job/artifacts-offline-install/job/artifacts-centos8-test/538/

01:08:06  < t:2024-02-22 23:08:03,544 f:cluster.py      l:507  c:sdcm.cluster_gce     p:INFO  > Node artifacts-centos8-jenkins-db-node-92bfa5c8-0-1 [34.23.71.148 | 10.142.0.56] (seed: True): Detected Linux distribution: CENTOS8
01:24:23  < t:2024-02-22 23:24:16,237 f:logcollector.py l:839  c:sdcm.logcollector    p:INFO  > Saving kallsyms map from host: artifacts-centos8-jenkins-db-node-92bfa5c8-0-1

Looks a bit different Anyhow please reference the test id, otherwise it's hard to look it up (after it's gone from Jenkins) test_id=92bfa5c8-3cd5-4cc7-95bf-94543c64745a

Mar 04 '24 18:03 fruch

@benipeled

this seems like slowdown of the centos8 repos

CentOS Stream 8 - BaseOS                         49 kB/s |  10 MB     03:35

we see from time to time, @roydahan suggest to drop that, if we already have rocky8, since it's basically the same

Mar 04 '24 20:03 fruch

already have issue for centos8 slowdowns, closing this one

May 27 '24 21:05 fruch

scylla-cluster-tests scylla-cluster-tests copied to clipboard

offline-install/artifacts-centos8-test fail on timeout connecting to GCE nodes

scylla-cluster-tests
scylla-cluster-tests copied to clipboard