scylla-operator icon indicating copy to clipboard operation
scylla-operator copied to clipboard

Scylla doesn't yet listen for CQL connections even though it's ready

Open zimnx opened this issue 3 years ago • 6 comments

I0826 13:26:41.857697       1 framework/framework.go:244] "Namespace removed." Namespace="e2e-test-scyllacluster-g87j8-cwrjq"

• Failure [182.680 seconds]
ScyllaCluster authentication
github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster/scyllacluster_auth.go:24
  agent requires authentication [It]
  github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster/scyllacluster_auth.go:29

  Unexpected error:
      <*fmt.wrapError | 0xc0004c2940>: {
          msg: "can't select data: gocql: no hosts availa
ble in the pool",
          err: {
              s: "gocql: no hosts available in the pool",
          },
      }
      can't select data: gocql: no hosts available in the pool
  occurred

  github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster/verify.go:119

  Full Stack Trace
  github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster.verifyScyllaCluster(0x1cef4e8, 0xc00007d920, 0x1d12c18, 0xc0008d5600, 0xc000433900, 0xc0000a41e0)
  	github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster/verify.go:119 +0x12f8
  github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster.glob..func1.1()
  	github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster/scyllacluster_auth.go:161 +0x2359
  github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc0002c8c00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
  	github.com/onsi/[email protected]/internal/leafnodes/runner.go:113 +0xa3
  github.com/onsi/ginkgo/internal/leafnodes.(*runner).run(0xc0002c8c00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
  	github.com/onsi/[email protected]/internal/leafnodes/runner.go:64 +0x15c
  github.com/onsi/ginkgo/internal/leafnodes.(*ItNode).Run(0xc0002e4240, 0x1cbc2a0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
  	github.com/onsi/[email protected]/internal/leafnodes/it_node.go:26 +0x87
  github.com/onsi/ginkgo/internal/spec.(*Spec).runSample(0xc0005c6e10, 0x0, 0x1cbc2a0, 0xc000305ec0)
  	github.com/onsi/[email protected]/internal/spec/spec.go:215 +0x72f
  github.com/onsi/ginkgo/internal/spec.(*Spec).Run(0xc0005c6e10, 0x1cbc2a0, 0xc000305ec0)
  	github.com/onsi/[email protected]/internal/spec/spec.go:138 +0xf2
  github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpec(0xc000155e40, 0xc0005c6e10, 0x0)
  	github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:200 +0x111
  github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpecs(0xc000155e40, 0x1)
  	github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:170 +0x147
  github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).Run(0xc000155e40, 0xc000357e20)
  	github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:66 +0x117
  github.com/onsi/ginkgo/internal/suite.(*Suite).Run(0xc00038ff80, 0x7f57e66c5c20, 0x28a7978, 0x1ac913b, 0x19, 0xc0002e4080, 0x2, 0x2, 0x1cf6878, 0xc000305ec0, ...)
  	github.com/onsi/[email protected]/internal/suite/suite.go:79 +0x546
  github.com/onsi/ginkgo.RunSpecsWithCustomReporters(0x1cbc320, 0x28a7978, 0x1ac913b, 0x19, 0xc0002e4060, 0x2, 0x2, 0x2)
  	github.com/onsi/[email protected]/ginkgo_dsl.go:229 +0x218
  github.com/onsi/ginkgo.RunSpecsWithDefaultAndCustomReporters(0x1cbc320, 0x28a7978, 0x1ac913b, 0x19, 0xc0003938e0, 0x1, 0x1, 0x1)
  	github.com/onsi/[email protected]/ginkgo_dsl.go:217 +0xad
  github.com/scylladb/scylla-operator/pkg/cmd/tests.(*RunOptions).run(0xc000152b00, 0x1cef478, 0xc000304f40, 0x1cbd460, 0xc000130000, 0x1cbd480, 0xc000130008, 0x1cbd480, 0xc000130010, 0x1, ...)
  	github.com/scylladb/scylla-operator/pkg/cmd/tests/tests_run.go:196 +0x257
  github.com/scylladb/scylla-operator/pkg/cmd/tests.(*RunOptions).Run(0xc000152b00, 0x1cbd460, 0xc000130000, 0x1cbd480, 0xc000130008, 0x1cbd480, 0xc000130010, 0x1aad033, 0x3, 0x0, ...)
  	github.com/scylladb/scylla-operator/pkg/cmd/tests/tests_run.go:149 +0x357
  github.com/scylladb/scylla-operator/pkg/cmd/tests.NewRunCommand.func1(0xc00023a000, 0xc0000942c0, 0x0, 0x2, 0x0, 0x0)
  	github.com/scylladb/scylla-operator/pkg/cmd/tests/tests_run.go:81 +0xfd
  github.com/spf13/cobra.(*Command).execute(0xc00023a000, 0xc0000942a0, 0x2, 0x2, 0xc00023a000, 0xc0000942a0)
  	github.com/spf13/[email protected]/command.go:850 +0x472
  github.com/spf13/cobra.(*Command).ExecuteC(0xc0001a3340, 0xc000130000, 0x1cbd480, 0xc000130008)
  	github.com/spf13/[email protected]/command.go:958 +0x375
  github.com/spf13/cobra.(*Command).Execute(...)
  	github.com/spf13/[email protected]/command.go:895
  main.main()
  	github.com/scylladb/scylla-operator/cmd/scylla-operator-tests/scylla-operator-tests.go:37 +0x23d

e2e-artifacts.tar.lz4 (20).zip logs_3194.zip

zimnx avatar Aug 26 '21 14:08 zimnx

Service IP doesn't change in the meantime and cluster rolls out fine. Flaky behaviour might be an unhandled case in gocql lib. I'm currently trying to reproduce it with gocql debug logs turned on.

rzetelskik avatar Sep 30 '21 12:09 rzetelskik

------------------------------
ScyllaCluster authentication 
  agent requires authentication
  github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster/scyllacluster_auth.go:29
[BeforeEach] ScyllaCluster authentication
  github.com/scylladb/scylla-operator/test/e2e/framework/framework.go:57
Oct  4 09:05:17.237: STEP: Creating a new namespace
Oct  4 09:05:17.243: INFO: Created namespace "e2e-test-scyllacluster-dmsv4-cvmlk".
Oct  4 09:05:17.272: STEP: Waiting for ServiceAccount "e2e-user" in namespace "e2e-test-scyllacluster-dmsv4-cvmlk".
Oct  4 09:05:17.285: STEP: Waiting for default ServiceAccount in namespace "e2e-test-scyllacluster-dmsv4-cvmlk".
[It] agent requires authentication
  github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster/scyllacluster_auth.go:29
Oct  4 09:05:17.288: STEP: Creating a ScyllaCluster
Oct  4 09:05:17.317: STEP: Waiting for the ScyllaCluster to deploy
Oct  4 09:05:59.297: INFO: ScyllaCluster e2e-test-scyllacluster-dmsv4-cvmlk/basic (RV=3029) is rolled out
Oct  4 09:05:59.310: STEP: Inserting data with RF=1
Oct  4 09:05:59.488: STEP: Verifying the ScyllaCluster
W1004 09:05:59.500618       1 rest/warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
Oct  4 09:05:59.507: INFO: Found 1 pvc(s) in namespace "e2e-test-scyllacluster-dmsv4-cvmlk"
Oct  4 09:05:59.507: INFO: Found 1 pvc(s) for ScyllaCluster "e2e-test-scyllacluster-dmsv4-cvmlk/basic"
Oct  4 09:05:59.539: STEP: Reading data with RF=1
Oct  4 09:05:59.543: STEP: Verifying data consistency
Oct  4 09:05:59.543: STEP: Rejecting an unauthorized request
Oct  4 09:06:00.565: STEP: Accepting requests authorized using token from provisioned secret
Oct  4 09:06:00.587: STEP: Specifying auth token in agent config
Oct  4 09:06:00.604: STEP: Initiating a rolling restart
Oct  4 09:06:00.615: STEP: Waiting for the ScyllaCluster to pick up token change
Oct  4 09:07:47.223: INFO: ScyllaCluster e2e-test-scyllacluster-dmsv4-cvmlk/basic (RV=3376) is rolled out
Oct  4 09:07:47.224: STEP: Verifying the ScyllaCluster
W1004 09:07:47.234636       1 rest/warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
Oct  4 09:07:47.237: INFO: Found 1 pvc(s) in namespace "e2e-test-scyllacluster-dmsv4-cvmlk"
Oct  4 09:07:47.237: INFO: Found 1 pvc(s) for ScyllaCluster "e2e-test-scyllacluster-dmsv4-cvmlk/basic"
Oct  4 09:07:47.274: STEP: Reading data with RF=1
Oct  4 09:07:47.285: STEP: Verifying data consistency
Oct  4 09:07:47.285: STEP: Accepting requests authorized using token from user agent config
Oct  4 09:07:47.303: STEP: Changing auth token in agent config
Oct  4 09:07:47.307: STEP: Initiating a rolling restart
Oct  4 09:07:47.321: STEP: Waiting for the ScyllaCluster to pick up token change
Oct  4 09:07:47.323: INFO: ScyllaCluster e2e-test-scyllacluster-dmsv4-cvmlk/basic (RV=3376) is rolled out
Oct  4 09:07:47.324: STEP: Verifying the ScyllaCluster
W1004 09:07:47.331903       1 rest/warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
Oct  4 09:07:47.337: INFO: Found 1 pvc(s) in namespace "e2e-test-scyllacluster-dmsv4-cvmlk"
Oct  4 09:07:47.337: INFO: Found 1 pvc(s) for ScyllaCluster "e2e-test-scyllacluster-dmsv4-cvmlk/basic"
{"L":"INFO","T":"2021-10-04T09:07:48.379Z","N":"scylla_client","M":"HTTP","host":"10.104.234.250:10001","method":"GET","uri":"/storage_service/host_id","duration":"1036ms","status":401,"bytes":37,"dump":"HTTP/1.1 401 Unauthorized\r\nConnection: close\r\nContent-Length: 37\r\nContent-Type: text/plain; charset=utf-8\r\nDate: Mon, 04 Oct 2021 09:07:48 GMT\r\n\r\n{\"message\":\"unauthorized\",\"code\":401}"}
{"L":"INFO","T":"2021-10-04T09:07:48.380Z","N":"scylla_client","M":"HTTP retry now","operation":"StorageServiceHostIdGet","error":"agent [HTTP 401] unauthorized"}
{"L":"INFO","T":"2021-10-04T09:07:48.380Z","N":"scylla_client","M":"HTTP retry now","operation":"StorageServiceHostIdGet","error":"dial tcp 10.104.234.250:10001: connect: connection refused"}
{"L":"INFO","T":"2021-10-04T09:07:48.380Z","N":"scylla_client","M":"HTTP retry now","operation":"StorageServiceHostIdGet","error":"dial tcp 10.104.234.250:10001: connect: connection refused"}
{"L":"INFO","T":"2021-10-04T09:07:48.380Z","N":"scylla_client","M":"HTTP retry now","operation":"StorageServiceHostIdGet","error":"dial tcp 10.104.234.250:10001: connect: connection refused"}
{"L":"INFO","T":"2021-10-04T09:07:48.381Z","N":"scylla_client","M":"HTTP retry now","operation":"StorageServiceHostIdGet","error":"dial tcp 10.104.234.250:10001: connect: connection refused"}
{"L":"INFO","T":"2021-10-04T09:07:48.381Z","N":"scylla_client","M":"HTTP retry now","operation":"StorageServiceHostIdGet","error":"dial tcp 10.104.234.250:10001: connect: connection refused"}
[...]
Oct  4 09:08:11.848: STEP: Reading data with RF=1
[AfterEach] ScyllaCluster authentication
  github.com/scylladb/scylla-operator/test/e2e/framework/framework.go:58
Oct  4 09:08:11.849: STEP: Collecting events from namespace "e2e-test-scyllacluster-dmsv4-cvmlk".
Oct  4 09:08:11.880: STEP: Found 65 events.
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:05:17 +0000 UTC - event for basic-auth-token: {scyllacluster-controller } SecretCreated: Secret e2e-test-scyllacluster-dmsv4-cvmlk/basic-auth-token created
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:05:17 +0000 UTC - event for basic-us-east-1-us-east-1a: {statefulset-controller } SuccessfulCreate: create Claim data-basic-us-east-1-us-east-1a-0 Pod basic-us-east-1-us-east-1a-0 in StatefulSet basic-us-east-1-us-east-1a success
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:05:17 +0000 UTC - event for basic-us-east-1-us-east-1a: {statefulset-controller } SuccessfulCreate: create Pod basic-us-east-1-us-east-1a-0 in StatefulSet basic-us-east-1-us-east-1a successful
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:05:17 +0000 UTC - event for basic-us-east-1-us-east-1a: {scyllacluster-controller } StatefulSetCreated: StatefulSet e2e-test-scyllacluster-dmsv4-cvmlk/basic-us-east-1-us-east-1a created
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:05:17 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {default-scheduler } FailedScheduling: 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:05:17 +0000 UTC - event for data-basic-us-east-1-us-east-1a-0: {k8s.io/minikube-hostpath_fv-az133-203_2e95ad47-9a16-4f02-9d8b-240a09ae978d } Provisioning: External provisioner is provisioning volume for claim "e2e-test-scyllacluster-dmsv4-cvmlk/data-basic-us-east-1-us-east-1a-0"
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:05:17 +0000 UTC - event for data-basic-us-east-1-us-east-1a-0: {persistentvolume-controller } ExternalProvisioning: waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:05:17 +0000 UTC - event for data-basic-us-east-1-us-east-1a-0: {k8s.io/minikube-hostpath_fv-az133-203_2e95ad47-9a16-4f02-9d8b-240a09ae978d } ProvisioningSucceeded: Successfully provisioned volume pvc-4dd5f68a-7054-463d-9bcc-91a89f636b05
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:05:19 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {default-scheduler } Scheduled: Successfully assigned e2e-test-scyllacluster-dmsv4-cvmlk/basic-us-east-1-us-east-1a-0 to fv-az133-203
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:05:20 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Started: Started container sidecar-injection
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:05:20 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Created: Created container sidecar-injection
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:05:20 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Pulled: Container image "docker.io/scylladb/scylla-operator:ci" already present on machine
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:05:27 +0000 UTC - event for basic: {scyllacluster-controller } PodDisruptionBudgetCreated: PodDisruptionBudget e2e-test-scyllacluster-dmsv4-cvmlk/basic created
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:05:27 +0000 UTC - event for basic-client: {scyllacluster-controller } ServiceCreated: Service e2e-test-scyllacluster-dmsv4-cvmlk/basic-client created
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:05:27 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Pulled: Container image "docker.io/scylladb/scylla-manager-agent:2.3.0" already present on machine
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:05:27 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Created: Created container scylla-manager-agent
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:05:27 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Created: Created container scylla
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:05:27 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Started: Started container scylla
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:05:27 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Pulled: Container image "docker.io/scylladb/scylla:4.4.2" already present on machine
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:05:27 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {scyllacluster-controller } ServiceCreated: Service e2e-test-scyllacluster-dmsv4-cvmlk/basic-us-east-1-us-east-1a-0 created
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:05:28 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Started: Started container scylla-manager-agent
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:05:29 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Unhealthy: Startup probe failed: HTTP probe failed with statuscode: 503
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:05:49 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Unhealthy: Readiness probe failed: HTTP probe failed with statuscode: 503
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:06:00 +0000 UTC - event for basic-auth-token: {scyllacluster-controller } SecretUpdated: Secret e2e-test-scyllacluster-dmsv4-cvmlk/basic-auth-token updated
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:06:00 +0000 UTC - event for basic-us-east-1-us-east-1a: {statefulset-controller } SuccessfulDelete: delete Pod basic-us-east-1-us-east-1a-0 in StatefulSet basic-us-east-1-us-east-1a successful
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:06:00 +0000 UTC - event for basic-us-east-1-us-east-1a: {scyllacluster-controller } StatefulSetUpdated: StatefulSet e2e-test-scyllacluster-dmsv4-cvmlk/basic-us-east-1-us-east-1a updated
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:06:00 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Killing: Stopping container scylla-manager-agent
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:06:00 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Killing: Stopping container scylla
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:06:17 +0000 UTC - event for basic: {controllermanager } NoPods: No matching pods found
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:06:17 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {default-scheduler } Scheduled: Successfully assigned e2e-test-scyllacluster-dmsv4-cvmlk/basic-us-east-1-us-east-1a-0 to fv-az133-203
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:06:19 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Started: Started container sidecar-injection
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:06:19 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Created: Created container sidecar-injection
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:06:19 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Pulled: Container image "docker.io/scylladb/scylla-operator:ci" already present on machine
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:06:27 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Started: Started container scylla
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:06:27 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Started: Started container scylla-manager-agent
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:06:27 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Created: Created container scylla-manager-agent
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:06:27 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Pulled: Container image "docker.io/scylladb/scylla-manager-agent:2.3.0" already present on machine
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:06:27 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Created: Created container scylla
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:06:27 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Pulled: Container image "docker.io/scylladb/scylla:4.4.2" already present on machine
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:06:47 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Unhealthy: Readiness probe failed: HTTP probe failed with statuscode: 503
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:06:57 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Killing: Stopping container scylla
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:06:57 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Killing: Stopping container scylla-manager-agent
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:07:07 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {default-scheduler } Scheduled: Successfully assigned e2e-test-scyllacluster-dmsv4-cvmlk/basic-us-east-1-us-east-1a-0 to fv-az133-203
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:07:08 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Pulled: Container image "docker.io/scylladb/scylla-operator:ci" already present on machine
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:07:08 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Created: Created container sidecar-injection
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:07:08 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Started: Started container sidecar-injection
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:07:16 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Created: Created container scylla
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:07:16 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Started: Started container scylla
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:07:16 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Pulled: Container image "docker.io/scylladb/scylla:4.4.2" already present on machine
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:07:16 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Pulled: Container image "docker.io/scylladb/scylla-manager-agent:2.3.0" already present on machine
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:07:16 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Created: Created container scylla-manager-agent
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:07:16 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Started: Started container scylla-manager-agent
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:07:37 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Unhealthy: Readiness probe failed: HTTP probe failed with statuscode: 503
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:07:47 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Killing: Stopping container scylla-manager-agent
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:07:47 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Killing: Stopping container scylla
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:07:57 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {default-scheduler } Scheduled: Successfully assigned e2e-test-scyllacluster-dmsv4-cvmlk/basic-us-east-1-us-east-1a-0 to fv-az133-203
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:07:58 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Created: Created container sidecar-injection
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:07:58 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Pulled: Container image "docker.io/scylladb/scylla-operator:ci" already present on machine
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:07:58 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Started: Started container sidecar-injection
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:08:06 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Created: Created container scylla
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:08:06 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Pulled: Container image "docker.io/scylladb/scylla:4.4.2" already present on machine
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:08:06 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Created: Created container scylla-manager-agent
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:08:06 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Pulled: Container image "docker.io/scylladb/scylla-manager-agent:2.3.0" already present on machine
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:08:06 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Started: Started container scylla
Oct  4 09:08:11.881: INFO: At 2021-10-04 09:08:07 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Started: Started container scylla-manager-agent
Oct  4 09:08:11.881: STEP: Collecting dumps from namespace "e2e-test-scyllacluster-dmsv4-cvmlk".
W1004 09:08:12.528905       1 rest/warnings.go:70] extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
Oct  4 09:08:12.584: STEP: Destroying namespace "e2e-test-scyllacluster-dmsv4-cvmlk".
Oct  4 09:08:12.593: STEP: Waiting for namespace "e2e-test-scyllacluster-dmsv4-cvmlk" to be removed.
I1004 09:08:39.440145       1 framework/framework.go:244] "Namespace removed." Namespace="e2e-test-scyllacluster-dmsv4-cvmlk"

• Failure [202.203 seconds]
ScyllaCluster authentication
github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster/scyllacluster_auth.go:24
  agent requires authentication [It]
  github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster/scyllacluster_auth.go:29

  Unexpected error:
      <*fmt.wrapError | 0xc0007c3f40>: {
          msg: "can't select data: gocql: no hosts available in the pool",
          err: {
              s: "gocql: no hosts available in the pool",
          },
      }
      can't select data: gocql: no hosts available in the pool
  occurred

  github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster/verify.go:119

  Full Stack Trace
  github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster.verifyScyllaCluster(0x1cef4e8, 0xc0000b55c0, 0x1d12c18, 0xc0008106e0, 0xc0003da000, 0xc000773db0)
  	github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster/verify.go:119 +0x12f8
  github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster.glob..func1.1()
  	github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster/scyllacluster_auth.go:161 +0x2359
  github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc0000861e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
  	github.com/onsi/[email protected]/internal/leafnodes/runner.go:113 +0xa3
  github.com/onsi/ginkgo/internal/leafnodes.(*runner).run(0xc0000861e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
  	github.com/onsi/[email protected]/internal/leafnodes/runner.go:64 +0x15c
  github.com/onsi/ginkgo/internal/leafnodes.(*ItNode).Run(0xc00002c1e0, 0x1cbc2a0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
  	github.com/onsi/[email protected]/internal/leafnodes/it_node.go:26 +0x87
  github.com/onsi/ginkgo/internal/spec.(*Spec).runSample(0xc0004b2b40, 0x0, 0x1cbc2a0, 0xc000285ec0)
  	github.com/onsi/[email protected]/internal/spec/spec.go:215 +0x72f
  github.com/onsi/ginkgo/internal/spec.(*Spec).Run(0xc0004b2b40, 0x1cbc2a0, 0xc000285ec0)
  	github.com/onsi/[email protected]/internal/spec/spec.go:138 +0xf2
  github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpec(0xc0002aa580, 0xc0004b2b40, 0x1)
  	github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:200 +0x111
  github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpecs(0xc0002aa580, 0x1)
  	github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:170 +0x147
  github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).Run(0xc0002aa580, 0xc000041d20)
  	github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:66 +0x117
  github.com/onsi/ginkgo/internal/suite.(*Suite).Run(0xc00030df10, 0x7faacded4728, 0x28a7978, 0x1ac913b, 0x19, 0xc00002c020, 0x2, 0x2, 0x1cf6878, 0xc000285ec0, ...)
  	github.com/onsi/[email protected]/internal/suite/suite.go:79 +0x546
  github.com/onsi/ginkgo.RunSpecsWithCustomReporters(0x1cbc320, 0x28a7978, 0x1ac913b, 0x19, 0xc00002c000, 0x2, 0x2, 0x2)
  	github.com/onsi/[email protected]/ginkgo_dsl.go:229 +0x218
  github.com/onsi/ginkgo.RunSpecsWithDefaultAndCustomReporters(0x1cbc320, 0x28a7978, 0x1ac913b, 0x19, 0xc0003fe010, 0x1, 0x1, 0x1)
  	github.com/onsi/[email protected]/ginkgo_dsl.go:217 +0xad
  github.com/scylladb/scylla-operator/pkg/cmd/tests.(*RunOptions).run(0xc0000d2e00, 0x1cef478, 0xc000350000, 0x1cbd460, 0xc0000b2000, 0x1cbd480, 0xc0000b2008, 0x1cbd480, 0xc0000b2010, 0x1, ...)
  	github.com/scylladb/scylla-operator/pkg/cmd/tests/tests_run.go:196 +0x257
  github.com/scylladb/scylla-operator/pkg/cmd/tests.(*RunOptions).Run(0xc0000d2e00, 0x1cbd460, 0xc0000b2000, 0x1cbd480, 0xc0000b2008, 0x1cbd480, 0xc0000b2010, 0x1aad033, 0x3, 0x0, ...)
  	github.com/scylladb/scylla-operator/pkg/cmd/tests/tests_run.go:149 +0x357
  github.com/scylladb/scylla-operator/pkg/cmd/tests.NewRunCommand.func1(0xc0001b8b00, 0xc0008822c0, 0x0, 0x2, 0x0, 0x0)
  	github.com/scylladb/scylla-operator/pkg/cmd/tests/tests_run.go:81 +0xfd
  github.com/spf13/cobra.(*Command).execute(0xc0001b8b00, 0xc0008822a0, 0x2, 0x2, 0xc0001b8b00, 0xc0008822a0)
  	github.com/spf13/[email protected]/command.go:850 +0x472
  github.com/spf13/cobra.(*Command).ExecuteC(0xc0001b8840, 0xc0000b2000, 0x1cbd480, 0xc0000b2008)
  	github.com/spf13/[email protected]/command.go:958 +0x375
  github.com/spf13/cobra.(*Command).Execute(...)
  	github.com/spf13/[email protected]/command.go:895
  main.main()
  	github.com/scylladb/scylla-operator/cmd/scylla-operator-tests/scylla-operator-tests.go:37 +0x23d
------------------------------


e2e-artifacts.tar.lz4 (36).zip

zimnx avatar Oct 04 '21 10:10 zimnx

It seems that flakiness comes from the fact that scylla's readiness probe is only based on checking whether the CQL native transport is enabled and not whether it listens for clients. In a failing scenario we wait for the ScyllaCluster to rollout and then try to read while gocql hasn't been able to declare the host as UN yet, hence the error. We should extend the readiness probe with connecting to the host on the native transport port, similar to how it's done in scylla-manager (https://github.com/scylladb/scylla-manager/blob/0456d3bc9657b549cd5593fa3d3ade3072c2d2af/pkg/ping/cqlping/cqlping.go#L47). Currently, using Scylla's REST API it is difficult to establish whether the port returned by a request to config/native_transport_port is encrypted and whether config/native_transport_port_ssl was set at all or if it's a default, unused value (see the issue raised by @zimnx https://github.com/scylladb/scylla/issues/7206). We should probably either push for the changes requested by Maciej in the issue linked above or at the least for an API endpoint that would allow us to determine if the native transport port uses encryption, to avoid making multiple requests and handling different errors in the scenarios described in the issue.

rzetelskik avatar Oct 07 '21 08:10 rzetelskik

good job, tracing this down. We need /readyz in scylla, pinging all ports and trying to figure out which ones should it be sucks and current API doesn't event allow to find out which ones.

tnozicka avatar Oct 07 '21 11:10 tnozicka

best if we can get some traction on https://github.com/scylladb/scylla/issues/8275

tnozicka avatar Oct 07 '21 11:10 tnozicka

At the moment, to check Scylla's readiness, we verify that Scylla is in UN (Up Normal) mode and that native transport is enabled. This is not reliable though, as the CQL native transport port may not be listening for clients yet despite /storage_service/native_transport returning OKs. We could take an additional measure of dialing the CQL native transport port, but there is another problem with establishing which port to use and whether it is encrypted (see above comments and https://github.com/scylladb/scylla/issues/7206). Scylla-manager handles it with a workaround (https://github.com/scylladb/scylla-manager/blob/0456d3bc9657b549cd5593fa3d3ade3072c2d2af/pkg/scyllaclient/client_agent.go#L73-L97) but this is undesired.

We either need /readyz as proposed by @tnozicka or a fix to the issues described above. @slivne what do you think?

rzetelskik avatar Oct 14 '21 17:10 rzetelskik

This should have been fixed since scylla-2022.1.0 by https://github.com/scylladb/scylladb/commit/a51529dd1547fe029e8c259f8b03891106f6edc0

nuivall avatar Mar 24 '23 13:03 nuivall

@nuivall let's discuss this in scylladb/scylladb#12376

rzetelskik avatar Mar 28 '23 14:03 rzetelskik

As per @nuivall's comment, the issue with /storage_service/native_transport for CQL has been fixed in https://github.com/scylladb/scylladb/commit/a51529dd1547fe029e8c259f8b03891106f6edc0, so the current implementation of the readiness probe should be correct in terms of Scylla listening for CQL connections. The dedicated endpoints (https://github.com/scylladb/scylladb/issues/8275) are a separate discussion though. I believe the thrift fix https://github.com/scylladb/scylladb/pull/13319 doesn't affect us - @zimnx shall we close this one or observe this?

rzetelskik avatar Mar 30 '23 10:03 rzetelskik

Lets close it, it's available since 5.0.0 which we are already using in CI.

zimnx avatar Mar 30 '23 11:03 zimnx