scylla-operator
scylla-operator copied to clipboard
Scylla doesn't yet listen for CQL connections even though it's ready
I0826 13:26:41.857697 1 framework/framework.go:244] "Namespace removed." Namespace="e2e-test-scyllacluster-g87j8-cwrjq"
• Failure [182.680 seconds]
ScyllaCluster authentication
github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster/scyllacluster_auth.go:24
agent requires authentication [It]
github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster/scyllacluster_auth.go:29
Unexpected error:
<*fmt.wrapError | 0xc0004c2940>: {
msg: "can't select data: gocql: no hosts availa
ble in the pool",
err: {
s: "gocql: no hosts available in the pool",
},
}
can't select data: gocql: no hosts available in the pool
occurred
github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster/verify.go:119
Full Stack Trace
github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster.verifyScyllaCluster(0x1cef4e8, 0xc00007d920, 0x1d12c18, 0xc0008d5600, 0xc000433900, 0xc0000a41e0)
github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster/verify.go:119 +0x12f8
github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster.glob..func1.1()
github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster/scyllacluster_auth.go:161 +0x2359
github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc0002c8c00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
github.com/onsi/[email protected]/internal/leafnodes/runner.go:113 +0xa3
github.com/onsi/ginkgo/internal/leafnodes.(*runner).run(0xc0002c8c00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
github.com/onsi/[email protected]/internal/leafnodes/runner.go:64 +0x15c
github.com/onsi/ginkgo/internal/leafnodes.(*ItNode).Run(0xc0002e4240, 0x1cbc2a0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
github.com/onsi/[email protected]/internal/leafnodes/it_node.go:26 +0x87
github.com/onsi/ginkgo/internal/spec.(*Spec).runSample(0xc0005c6e10, 0x0, 0x1cbc2a0, 0xc000305ec0)
github.com/onsi/[email protected]/internal/spec/spec.go:215 +0x72f
github.com/onsi/ginkgo/internal/spec.(*Spec).Run(0xc0005c6e10, 0x1cbc2a0, 0xc000305ec0)
github.com/onsi/[email protected]/internal/spec/spec.go:138 +0xf2
github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpec(0xc000155e40, 0xc0005c6e10, 0x0)
github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:200 +0x111
github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpecs(0xc000155e40, 0x1)
github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:170 +0x147
github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).Run(0xc000155e40, 0xc000357e20)
github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:66 +0x117
github.com/onsi/ginkgo/internal/suite.(*Suite).Run(0xc00038ff80, 0x7f57e66c5c20, 0x28a7978, 0x1ac913b, 0x19, 0xc0002e4080, 0x2, 0x2, 0x1cf6878, 0xc000305ec0, ...)
github.com/onsi/[email protected]/internal/suite/suite.go:79 +0x546
github.com/onsi/ginkgo.RunSpecsWithCustomReporters(0x1cbc320, 0x28a7978, 0x1ac913b, 0x19, 0xc0002e4060, 0x2, 0x2, 0x2)
github.com/onsi/[email protected]/ginkgo_dsl.go:229 +0x218
github.com/onsi/ginkgo.RunSpecsWithDefaultAndCustomReporters(0x1cbc320, 0x28a7978, 0x1ac913b, 0x19, 0xc0003938e0, 0x1, 0x1, 0x1)
github.com/onsi/[email protected]/ginkgo_dsl.go:217 +0xad
github.com/scylladb/scylla-operator/pkg/cmd/tests.(*RunOptions).run(0xc000152b00, 0x1cef478, 0xc000304f40, 0x1cbd460, 0xc000130000, 0x1cbd480, 0xc000130008, 0x1cbd480, 0xc000130010, 0x1, ...)
github.com/scylladb/scylla-operator/pkg/cmd/tests/tests_run.go:196 +0x257
github.com/scylladb/scylla-operator/pkg/cmd/tests.(*RunOptions).Run(0xc000152b00, 0x1cbd460, 0xc000130000, 0x1cbd480, 0xc000130008, 0x1cbd480, 0xc000130010, 0x1aad033, 0x3, 0x0, ...)
github.com/scylladb/scylla-operator/pkg/cmd/tests/tests_run.go:149 +0x357
github.com/scylladb/scylla-operator/pkg/cmd/tests.NewRunCommand.func1(0xc00023a000, 0xc0000942c0, 0x0, 0x2, 0x0, 0x0)
github.com/scylladb/scylla-operator/pkg/cmd/tests/tests_run.go:81 +0xfd
github.com/spf13/cobra.(*Command).execute(0xc00023a000, 0xc0000942a0, 0x2, 0x2, 0xc00023a000, 0xc0000942a0)
github.com/spf13/[email protected]/command.go:850 +0x472
github.com/spf13/cobra.(*Command).ExecuteC(0xc0001a3340, 0xc000130000, 0x1cbd480, 0xc000130008)
github.com/spf13/[email protected]/command.go:958 +0x375
github.com/spf13/cobra.(*Command).Execute(...)
github.com/spf13/[email protected]/command.go:895
main.main()
github.com/scylladb/scylla-operator/cmd/scylla-operator-tests/scylla-operator-tests.go:37 +0x23d
Service IP doesn't change in the meantime and cluster rolls out fine. Flaky behaviour might be an unhandled case in gocql lib. I'm currently trying to reproduce it with gocql debug logs turned on.
------------------------------
ScyllaCluster authentication
agent requires authentication
github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster/scyllacluster_auth.go:29
[BeforeEach] ScyllaCluster authentication
github.com/scylladb/scylla-operator/test/e2e/framework/framework.go:57
Oct 4 09:05:17.237: STEP: Creating a new namespace
Oct 4 09:05:17.243: INFO: Created namespace "e2e-test-scyllacluster-dmsv4-cvmlk".
Oct 4 09:05:17.272: STEP: Waiting for ServiceAccount "e2e-user" in namespace "e2e-test-scyllacluster-dmsv4-cvmlk".
Oct 4 09:05:17.285: STEP: Waiting for default ServiceAccount in namespace "e2e-test-scyllacluster-dmsv4-cvmlk".
[It] agent requires authentication
github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster/scyllacluster_auth.go:29
Oct 4 09:05:17.288: STEP: Creating a ScyllaCluster
Oct 4 09:05:17.317: STEP: Waiting for the ScyllaCluster to deploy
Oct 4 09:05:59.297: INFO: ScyllaCluster e2e-test-scyllacluster-dmsv4-cvmlk/basic (RV=3029) is rolled out
Oct 4 09:05:59.310: STEP: Inserting data with RF=1
Oct 4 09:05:59.488: STEP: Verifying the ScyllaCluster
W1004 09:05:59.500618 1 rest/warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
Oct 4 09:05:59.507: INFO: Found 1 pvc(s) in namespace "e2e-test-scyllacluster-dmsv4-cvmlk"
Oct 4 09:05:59.507: INFO: Found 1 pvc(s) for ScyllaCluster "e2e-test-scyllacluster-dmsv4-cvmlk/basic"
Oct 4 09:05:59.539: STEP: Reading data with RF=1
Oct 4 09:05:59.543: STEP: Verifying data consistency
Oct 4 09:05:59.543: STEP: Rejecting an unauthorized request
Oct 4 09:06:00.565: STEP: Accepting requests authorized using token from provisioned secret
Oct 4 09:06:00.587: STEP: Specifying auth token in agent config
Oct 4 09:06:00.604: STEP: Initiating a rolling restart
Oct 4 09:06:00.615: STEP: Waiting for the ScyllaCluster to pick up token change
Oct 4 09:07:47.223: INFO: ScyllaCluster e2e-test-scyllacluster-dmsv4-cvmlk/basic (RV=3376) is rolled out
Oct 4 09:07:47.224: STEP: Verifying the ScyllaCluster
W1004 09:07:47.234636 1 rest/warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
Oct 4 09:07:47.237: INFO: Found 1 pvc(s) in namespace "e2e-test-scyllacluster-dmsv4-cvmlk"
Oct 4 09:07:47.237: INFO: Found 1 pvc(s) for ScyllaCluster "e2e-test-scyllacluster-dmsv4-cvmlk/basic"
Oct 4 09:07:47.274: STEP: Reading data with RF=1
Oct 4 09:07:47.285: STEP: Verifying data consistency
Oct 4 09:07:47.285: STEP: Accepting requests authorized using token from user agent config
Oct 4 09:07:47.303: STEP: Changing auth token in agent config
Oct 4 09:07:47.307: STEP: Initiating a rolling restart
Oct 4 09:07:47.321: STEP: Waiting for the ScyllaCluster to pick up token change
Oct 4 09:07:47.323: INFO: ScyllaCluster e2e-test-scyllacluster-dmsv4-cvmlk/basic (RV=3376) is rolled out
Oct 4 09:07:47.324: STEP: Verifying the ScyllaCluster
W1004 09:07:47.331903 1 rest/warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
Oct 4 09:07:47.337: INFO: Found 1 pvc(s) in namespace "e2e-test-scyllacluster-dmsv4-cvmlk"
Oct 4 09:07:47.337: INFO: Found 1 pvc(s) for ScyllaCluster "e2e-test-scyllacluster-dmsv4-cvmlk/basic"
{"L":"INFO","T":"2021-10-04T09:07:48.379Z","N":"scylla_client","M":"HTTP","host":"10.104.234.250:10001","method":"GET","uri":"/storage_service/host_id","duration":"1036ms","status":401,"bytes":37,"dump":"HTTP/1.1 401 Unauthorized\r\nConnection: close\r\nContent-Length: 37\r\nContent-Type: text/plain; charset=utf-8\r\nDate: Mon, 04 Oct 2021 09:07:48 GMT\r\n\r\n{\"message\":\"unauthorized\",\"code\":401}"}
{"L":"INFO","T":"2021-10-04T09:07:48.380Z","N":"scylla_client","M":"HTTP retry now","operation":"StorageServiceHostIdGet","error":"agent [HTTP 401] unauthorized"}
{"L":"INFO","T":"2021-10-04T09:07:48.380Z","N":"scylla_client","M":"HTTP retry now","operation":"StorageServiceHostIdGet","error":"dial tcp 10.104.234.250:10001: connect: connection refused"}
{"L":"INFO","T":"2021-10-04T09:07:48.380Z","N":"scylla_client","M":"HTTP retry now","operation":"StorageServiceHostIdGet","error":"dial tcp 10.104.234.250:10001: connect: connection refused"}
{"L":"INFO","T":"2021-10-04T09:07:48.380Z","N":"scylla_client","M":"HTTP retry now","operation":"StorageServiceHostIdGet","error":"dial tcp 10.104.234.250:10001: connect: connection refused"}
{"L":"INFO","T":"2021-10-04T09:07:48.381Z","N":"scylla_client","M":"HTTP retry now","operation":"StorageServiceHostIdGet","error":"dial tcp 10.104.234.250:10001: connect: connection refused"}
{"L":"INFO","T":"2021-10-04T09:07:48.381Z","N":"scylla_client","M":"HTTP retry now","operation":"StorageServiceHostIdGet","error":"dial tcp 10.104.234.250:10001: connect: connection refused"}
[...]
Oct 4 09:08:11.848: STEP: Reading data with RF=1
[AfterEach] ScyllaCluster authentication
github.com/scylladb/scylla-operator/test/e2e/framework/framework.go:58
Oct 4 09:08:11.849: STEP: Collecting events from namespace "e2e-test-scyllacluster-dmsv4-cvmlk".
Oct 4 09:08:11.880: STEP: Found 65 events.
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:05:17 +0000 UTC - event for basic-auth-token: {scyllacluster-controller } SecretCreated: Secret e2e-test-scyllacluster-dmsv4-cvmlk/basic-auth-token created
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:05:17 +0000 UTC - event for basic-us-east-1-us-east-1a: {statefulset-controller } SuccessfulCreate: create Claim data-basic-us-east-1-us-east-1a-0 Pod basic-us-east-1-us-east-1a-0 in StatefulSet basic-us-east-1-us-east-1a success
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:05:17 +0000 UTC - event for basic-us-east-1-us-east-1a: {statefulset-controller } SuccessfulCreate: create Pod basic-us-east-1-us-east-1a-0 in StatefulSet basic-us-east-1-us-east-1a successful
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:05:17 +0000 UTC - event for basic-us-east-1-us-east-1a: {scyllacluster-controller } StatefulSetCreated: StatefulSet e2e-test-scyllacluster-dmsv4-cvmlk/basic-us-east-1-us-east-1a created
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:05:17 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {default-scheduler } FailedScheduling: 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:05:17 +0000 UTC - event for data-basic-us-east-1-us-east-1a-0: {k8s.io/minikube-hostpath_fv-az133-203_2e95ad47-9a16-4f02-9d8b-240a09ae978d } Provisioning: External provisioner is provisioning volume for claim "e2e-test-scyllacluster-dmsv4-cvmlk/data-basic-us-east-1-us-east-1a-0"
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:05:17 +0000 UTC - event for data-basic-us-east-1-us-east-1a-0: {persistentvolume-controller } ExternalProvisioning: waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:05:17 +0000 UTC - event for data-basic-us-east-1-us-east-1a-0: {k8s.io/minikube-hostpath_fv-az133-203_2e95ad47-9a16-4f02-9d8b-240a09ae978d } ProvisioningSucceeded: Successfully provisioned volume pvc-4dd5f68a-7054-463d-9bcc-91a89f636b05
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:05:19 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {default-scheduler } Scheduled: Successfully assigned e2e-test-scyllacluster-dmsv4-cvmlk/basic-us-east-1-us-east-1a-0 to fv-az133-203
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:05:20 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Started: Started container sidecar-injection
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:05:20 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Created: Created container sidecar-injection
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:05:20 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Pulled: Container image "docker.io/scylladb/scylla-operator:ci" already present on machine
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:05:27 +0000 UTC - event for basic: {scyllacluster-controller } PodDisruptionBudgetCreated: PodDisruptionBudget e2e-test-scyllacluster-dmsv4-cvmlk/basic created
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:05:27 +0000 UTC - event for basic-client: {scyllacluster-controller } ServiceCreated: Service e2e-test-scyllacluster-dmsv4-cvmlk/basic-client created
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:05:27 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Pulled: Container image "docker.io/scylladb/scylla-manager-agent:2.3.0" already present on machine
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:05:27 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Created: Created container scylla-manager-agent
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:05:27 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Created: Created container scylla
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:05:27 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Started: Started container scylla
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:05:27 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Pulled: Container image "docker.io/scylladb/scylla:4.4.2" already present on machine
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:05:27 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {scyllacluster-controller } ServiceCreated: Service e2e-test-scyllacluster-dmsv4-cvmlk/basic-us-east-1-us-east-1a-0 created
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:05:28 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Started: Started container scylla-manager-agent
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:05:29 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Unhealthy: Startup probe failed: HTTP probe failed with statuscode: 503
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:05:49 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Unhealthy: Readiness probe failed: HTTP probe failed with statuscode: 503
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:06:00 +0000 UTC - event for basic-auth-token: {scyllacluster-controller } SecretUpdated: Secret e2e-test-scyllacluster-dmsv4-cvmlk/basic-auth-token updated
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:06:00 +0000 UTC - event for basic-us-east-1-us-east-1a: {statefulset-controller } SuccessfulDelete: delete Pod basic-us-east-1-us-east-1a-0 in StatefulSet basic-us-east-1-us-east-1a successful
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:06:00 +0000 UTC - event for basic-us-east-1-us-east-1a: {scyllacluster-controller } StatefulSetUpdated: StatefulSet e2e-test-scyllacluster-dmsv4-cvmlk/basic-us-east-1-us-east-1a updated
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:06:00 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Killing: Stopping container scylla-manager-agent
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:06:00 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Killing: Stopping container scylla
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:06:17 +0000 UTC - event for basic: {controllermanager } NoPods: No matching pods found
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:06:17 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {default-scheduler } Scheduled: Successfully assigned e2e-test-scyllacluster-dmsv4-cvmlk/basic-us-east-1-us-east-1a-0 to fv-az133-203
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:06:19 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Started: Started container sidecar-injection
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:06:19 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Created: Created container sidecar-injection
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:06:19 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Pulled: Container image "docker.io/scylladb/scylla-operator:ci" already present on machine
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:06:27 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Started: Started container scylla
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:06:27 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Started: Started container scylla-manager-agent
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:06:27 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Created: Created container scylla-manager-agent
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:06:27 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Pulled: Container image "docker.io/scylladb/scylla-manager-agent:2.3.0" already present on machine
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:06:27 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Created: Created container scylla
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:06:27 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Pulled: Container image "docker.io/scylladb/scylla:4.4.2" already present on machine
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:06:47 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Unhealthy: Readiness probe failed: HTTP probe failed with statuscode: 503
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:06:57 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Killing: Stopping container scylla
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:06:57 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Killing: Stopping container scylla-manager-agent
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:07:07 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {default-scheduler } Scheduled: Successfully assigned e2e-test-scyllacluster-dmsv4-cvmlk/basic-us-east-1-us-east-1a-0 to fv-az133-203
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:07:08 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Pulled: Container image "docker.io/scylladb/scylla-operator:ci" already present on machine
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:07:08 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Created: Created container sidecar-injection
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:07:08 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Started: Started container sidecar-injection
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:07:16 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Created: Created container scylla
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:07:16 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Started: Started container scylla
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:07:16 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Pulled: Container image "docker.io/scylladb/scylla:4.4.2" already present on machine
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:07:16 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Pulled: Container image "docker.io/scylladb/scylla-manager-agent:2.3.0" already present on machine
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:07:16 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Created: Created container scylla-manager-agent
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:07:16 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Started: Started container scylla-manager-agent
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:07:37 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Unhealthy: Readiness probe failed: HTTP probe failed with statuscode: 503
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:07:47 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Killing: Stopping container scylla-manager-agent
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:07:47 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Killing: Stopping container scylla
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:07:57 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {default-scheduler } Scheduled: Successfully assigned e2e-test-scyllacluster-dmsv4-cvmlk/basic-us-east-1-us-east-1a-0 to fv-az133-203
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:07:58 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Created: Created container sidecar-injection
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:07:58 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Pulled: Container image "docker.io/scylladb/scylla-operator:ci" already present on machine
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:07:58 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Started: Started container sidecar-injection
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:08:06 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Created: Created container scylla
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:08:06 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Pulled: Container image "docker.io/scylladb/scylla:4.4.2" already present on machine
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:08:06 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Created: Created container scylla-manager-agent
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:08:06 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Pulled: Container image "docker.io/scylladb/scylla-manager-agent:2.3.0" already present on machine
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:08:06 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Started: Started container scylla
Oct 4 09:08:11.881: INFO: At 2021-10-04 09:08:07 +0000 UTC - event for basic-us-east-1-us-east-1a-0: {kubelet fv-az133-203} Started: Started container scylla-manager-agent
Oct 4 09:08:11.881: STEP: Collecting dumps from namespace "e2e-test-scyllacluster-dmsv4-cvmlk".
W1004 09:08:12.528905 1 rest/warnings.go:70] extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
Oct 4 09:08:12.584: STEP: Destroying namespace "e2e-test-scyllacluster-dmsv4-cvmlk".
Oct 4 09:08:12.593: STEP: Waiting for namespace "e2e-test-scyllacluster-dmsv4-cvmlk" to be removed.
I1004 09:08:39.440145 1 framework/framework.go:244] "Namespace removed." Namespace="e2e-test-scyllacluster-dmsv4-cvmlk"
• Failure [202.203 seconds]
ScyllaCluster authentication
github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster/scyllacluster_auth.go:24
agent requires authentication [It]
github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster/scyllacluster_auth.go:29
Unexpected error:
<*fmt.wrapError | 0xc0007c3f40>: {
msg: "can't select data: gocql: no hosts available in the pool",
err: {
s: "gocql: no hosts available in the pool",
},
}
can't select data: gocql: no hosts available in the pool
occurred
github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster/verify.go:119
Full Stack Trace
github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster.verifyScyllaCluster(0x1cef4e8, 0xc0000b55c0, 0x1d12c18, 0xc0008106e0, 0xc0003da000, 0xc000773db0)
github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster/verify.go:119 +0x12f8
github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster.glob..func1.1()
github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster/scyllacluster_auth.go:161 +0x2359
github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc0000861e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
github.com/onsi/[email protected]/internal/leafnodes/runner.go:113 +0xa3
github.com/onsi/ginkgo/internal/leafnodes.(*runner).run(0xc0000861e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
github.com/onsi/[email protected]/internal/leafnodes/runner.go:64 +0x15c
github.com/onsi/ginkgo/internal/leafnodes.(*ItNode).Run(0xc00002c1e0, 0x1cbc2a0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
github.com/onsi/[email protected]/internal/leafnodes/it_node.go:26 +0x87
github.com/onsi/ginkgo/internal/spec.(*Spec).runSample(0xc0004b2b40, 0x0, 0x1cbc2a0, 0xc000285ec0)
github.com/onsi/[email protected]/internal/spec/spec.go:215 +0x72f
github.com/onsi/ginkgo/internal/spec.(*Spec).Run(0xc0004b2b40, 0x1cbc2a0, 0xc000285ec0)
github.com/onsi/[email protected]/internal/spec/spec.go:138 +0xf2
github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpec(0xc0002aa580, 0xc0004b2b40, 0x1)
github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:200 +0x111
github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpecs(0xc0002aa580, 0x1)
github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:170 +0x147
github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).Run(0xc0002aa580, 0xc000041d20)
github.com/onsi/[email protected]/internal/specrunner/spec_runner.go:66 +0x117
github.com/onsi/ginkgo/internal/suite.(*Suite).Run(0xc00030df10, 0x7faacded4728, 0x28a7978, 0x1ac913b, 0x19, 0xc00002c020, 0x2, 0x2, 0x1cf6878, 0xc000285ec0, ...)
github.com/onsi/[email protected]/internal/suite/suite.go:79 +0x546
github.com/onsi/ginkgo.RunSpecsWithCustomReporters(0x1cbc320, 0x28a7978, 0x1ac913b, 0x19, 0xc00002c000, 0x2, 0x2, 0x2)
github.com/onsi/[email protected]/ginkgo_dsl.go:229 +0x218
github.com/onsi/ginkgo.RunSpecsWithDefaultAndCustomReporters(0x1cbc320, 0x28a7978, 0x1ac913b, 0x19, 0xc0003fe010, 0x1, 0x1, 0x1)
github.com/onsi/[email protected]/ginkgo_dsl.go:217 +0xad
github.com/scylladb/scylla-operator/pkg/cmd/tests.(*RunOptions).run(0xc0000d2e00, 0x1cef478, 0xc000350000, 0x1cbd460, 0xc0000b2000, 0x1cbd480, 0xc0000b2008, 0x1cbd480, 0xc0000b2010, 0x1, ...)
github.com/scylladb/scylla-operator/pkg/cmd/tests/tests_run.go:196 +0x257
github.com/scylladb/scylla-operator/pkg/cmd/tests.(*RunOptions).Run(0xc0000d2e00, 0x1cbd460, 0xc0000b2000, 0x1cbd480, 0xc0000b2008, 0x1cbd480, 0xc0000b2010, 0x1aad033, 0x3, 0x0, ...)
github.com/scylladb/scylla-operator/pkg/cmd/tests/tests_run.go:149 +0x357
github.com/scylladb/scylla-operator/pkg/cmd/tests.NewRunCommand.func1(0xc0001b8b00, 0xc0008822c0, 0x0, 0x2, 0x0, 0x0)
github.com/scylladb/scylla-operator/pkg/cmd/tests/tests_run.go:81 +0xfd
github.com/spf13/cobra.(*Command).execute(0xc0001b8b00, 0xc0008822a0, 0x2, 0x2, 0xc0001b8b00, 0xc0008822a0)
github.com/spf13/[email protected]/command.go:850 +0x472
github.com/spf13/cobra.(*Command).ExecuteC(0xc0001b8840, 0xc0000b2000, 0x1cbd480, 0xc0000b2008)
github.com/spf13/[email protected]/command.go:958 +0x375
github.com/spf13/cobra.(*Command).Execute(...)
github.com/spf13/[email protected]/command.go:895
main.main()
github.com/scylladb/scylla-operator/cmd/scylla-operator-tests/scylla-operator-tests.go:37 +0x23d
------------------------------
It seems that flakiness comes from the fact that scylla's readiness probe is only based on checking whether the CQL native transport is enabled and not whether it listens for clients. In a failing scenario we wait for the ScyllaCluster to rollout and then try to read while gocql hasn't been able to declare the host as UN yet, hence the error.
We should extend the readiness probe with connecting to the host on the native transport port, similar to how it's done in scylla-manager (https://github.com/scylladb/scylla-manager/blob/0456d3bc9657b549cd5593fa3d3ade3072c2d2af/pkg/ping/cqlping/cqlping.go#L47). Currently, using Scylla's REST API it is difficult to establish whether the port returned by a request to config/native_transport_port
is encrypted and whether config/native_transport_port_ssl
was set at all or if it's a default, unused value (see the issue raised by @zimnx https://github.com/scylladb/scylla/issues/7206). We should probably either push for the changes requested by Maciej in the issue linked above or at the least for an API endpoint that would allow us to determine if the native transport port uses encryption, to avoid making multiple requests and handling different errors in the scenarios described in the issue.
good job, tracing this down. We need /readyz in scylla, pinging all ports and trying to figure out which ones should it be sucks and current API doesn't event allow to find out which ones.
best if we can get some traction on https://github.com/scylladb/scylla/issues/8275
At the moment, to check Scylla's readiness, we verify that Scylla is in UN (Up Normal) mode and that native transport is enabled. This is not reliable though, as the CQL native transport port may not be listening for clients yet despite /storage_service/native_transport
returning OKs. We could take an additional measure of dialing the CQL native transport port, but there is another problem with establishing which port to use and whether it is encrypted (see above comments and https://github.com/scylladb/scylla/issues/7206). Scylla-manager handles it with a workaround (https://github.com/scylladb/scylla-manager/blob/0456d3bc9657b549cd5593fa3d3ade3072c2d2af/pkg/scyllaclient/client_agent.go#L73-L97) but this is undesired.
We either need /readyz
as proposed by @tnozicka or a fix to the issues described above.
@slivne what do you think?
This should have been fixed since scylla-2022.1.0 by https://github.com/scylladb/scylladb/commit/a51529dd1547fe029e8c259f8b03891106f6edc0
@nuivall let's discuss this in scylladb/scylladb#12376
As per @nuivall's comment, the issue with /storage_service/native_transport
for CQL has been fixed in https://github.com/scylladb/scylladb/commit/a51529dd1547fe029e8c259f8b03891106f6edc0, so the current implementation of the readiness probe should be correct in terms of Scylla listening for CQL connections. The dedicated endpoints (https://github.com/scylladb/scylladb/issues/8275) are a separate discussion though. I believe the thrift fix https://github.com/scylladb/scylladb/pull/13319 doesn't affect us - @zimnx shall we close this one or observe this?
Lets close it, it's available since 5.0.0 which we are already using in CI.