scylla-operator ScyllaCluster's joined across namespaces in e2e

In one of our e2e runs I've seen scylla to report live host from 2 distinct namespaces:

  Expected
      <[]string | len:7, cap:8>: ["10.101.211.39", "10.103.182.113", "10.104.198.68", "10.104.212.153", "10.105.244.156", "10.107.86.7", "10.109.55.176"]
  to have length 3
  In [It] at: github.com/scylladb/scylla-operator/test/e2e/set/scyllacluster/verify.go:127

e2e-namespaces/e2e-test-scyllacluster-kx44s-b4mlz/core_v1/services/basic-us-east-1-rack-0-1.yaml:27:    clusterIP: 10.109.55.176
e2e-namespaces/e2e-test-scyllacluster-kx44s-b4mlz/core_v1/services/basic-us-east-1-rack-0-2.yaml:27:    clusterIP: 10.101.211.39
e2e-namespaces/e2e-test-scyllacluster-kx44s-b4mlz/core_v1/services/basic-us-east-1-rack-0-0.yaml:27:    clusterIP: 10.104.198.68

e2e-namespaces/e2e-test-scyllacluster-9t28c-jdmfm/core_v1/services/basic-us-east-1-rack-1-0.yaml:27:    clusterIP: 10.104.212.153
e2e-namespaces/e2e-test-scyllacluster-9t28c-jdmfm/core_v1/services/basic-us-east-1-rack-0-1.yaml:27:    clusterIP: 10.105.244.156
e2e-namespaces/e2e-test-scyllacluster-9t28c-jdmfm/core_v1/services/basic-us-east-1-rack-0-2.yaml:27:    clusterIP: 10.107.86.7
e2e-namespaces/e2e-test-scyllacluster-9t28c-jdmfm/core_v1/services/basic-us-east-1-rack-0-0.yaml:27:    clusterIP: 10.103.182.113

https://github.com/scylladb/scylla-operator/runs/4819154541?check_suite_focus=true#step:12:816 https://github.com/scylladb/scylla-operator/suites/4940241257/artifacts/143046650

Jan 17 '22 09:01 tnozicka

how come there was network connectivity between them ? aren't the clusters isolated ?

Jan 17 '22 09:01 slivne

@slivne they aren't, it happened in our e2e. Our e2e minikube doesn't support NetworkPolicies by default. We would have to install custom CNI. On production envs, users may configure NetworkPolicy on their own to isolate Pods.

Having one would also hide possible issue. The problem is that they shouldn't know anything about each other - configs should be separated - yet they managed to connect . Unless Scylla does the discovery on its own, but afaik it doesn't, they shouldn't form a cluster.

Jan 17 '22 09:01 zimnx

While going through the Kubernetes audit logs, on upgrade, one of the pods got recycled PodIP after the other cluster - maybe gossip was using it for membership an that's how those clusters got connected. Normally Pod traffic is isolated between namespaces but not in minikube.

Jan 17 '22 13:01 tnozicka

I am lowering the priority and scheduling it to 1.8. This has always been the case so it's not a regression in the operator. Unfortunately, we use the ScyllaCluster name for cluster identity without the namespace. Migrating it over in a backwards compatible manner also addressing the preexisting clusters would take extreme effort. As we were already planing to setup mTLS for scylla node we are gonna aim for it as the actual fix.

The workaround to avoid hitting this issue is to name your ScyllaClusters uniquely across namespaces or avoid upgrading (or otherwise replacing pods) for more then one cluster at a time.

Jan 19 '22 14:01 tnozicka

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 30d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out

/lifecycle stale

Jun 23 '24 10:06 scylla-operator-bot[bot]

tracked in https://github.com/scylladb/scylla-operator/issues/928

Jun 24 '24 09:06 tnozicka

scylla-operator scylla-operator copied to clipboard

ScyllaCluster's joined across namespaces in e2e

scylla-operator
scylla-operator copied to clipboard