scylla-operator K8S services as rpc_address causes ConnectionException for clients outside the K8S cluster.

I setup a Scylladb cluster with recompiled operator v.0.2.4 (including this PR https://github.com/scylladb/scylla-operator/pull/195). Cluster is on EKS (AWS, one single region, 2 AZs, 3 nodes per AZ, dedicated node per pod with hostNetworking:true). Some apps will use that cluster from outside EKS, so I used a Network Load Balancer (listener on 9042), pointing to a K8S service (created by me) with all scylla pods underlying this service.

My custom scylla.yaml has only one line:

endpoint_snitch: Ec2Snitch

To simulate an external-K8S use of scylla I use a tools like cassandra-test, launching this command:

cassandra-stress write n=10000 -rate threads=24 -node scylladb.nlb.endpoint -mode native cql3

I get this error:

Datatacenter: us-west; Host: scylladb.lb.endpoint/10.10.10.10; Rack: 1b
Datatacenter: us-west; Host: /172.20.161.58; Rack: 1a
Datatacenter: us-west; Host: /172.20.127.61; Rack: 1a
Datatacenter: us-west; Host: /172.20.174.141; Rack: 1b
Datatacenter: us-west; Host: /172.20.136.6; Rack: 1a
WARN  19:46:52,258 Error creating pool to /172.20.161.58:9042
com.datastax.driver.core.exceptions.ConnectionException: [/172.20.161.58] Pool was closed during initialization
172.X.X.X are K8S service IPs and not POD IPs.

Since I launch a client from the same K8S subnet, this client can reach same subnet’s IP but it receives K8Service IPs back (which it cannot resolve).

Questions and doubts:

What's the advantage of having a 1:1 relation between pod and K8Service? I had to manually create a K8Service with all referenced scylla-pods, the operator doesn't create it.
Is there a way with operator to avoid the use of K8Services as rpc_address?
If you have suggestions on how to use the operator for my use-case (communication with scylla cluster from outside K8S cluster), these are welcomed.

Oct 12 '20 10:10 nbenaglia

I will try to provide some answers but would like confirmation from scylla-operator developers.

Pod IP's are generally ephemeral. The k8s service IP is not ephemeral and new pods can re-attach to them. This was in preparation for upgrades, for example. You would keep the storage but deploy a new pod with a new scylla patch version and scylla would keep the k8s service IP.

2/3. It is ok to use it listen_address and rpc_address as the k8s service. If you need external clients to access them (and even in preparation for a multi-dc multi-region deployment) you should use broadcast_address and broadcast_rpc_address as per: https://docs.scylladb.com/operating-scylla/admin/#advanced-networking similarly to what you would do with a regular deployment on AWS: https://docs.scylladb.com/operating-scylla/procedures/cluster-management/ec2_dc/#ec2-configuration-table

IMHO we need to add some features to the operator to support these things out of the box.

Oct 12 '20 15:10 gnumoreno

I added broadcast_address and broadcast_rpc_addressto scylla-configmap (I create one config-map per node), like this:

endpoint_snitch: Ec2Snitch
broadcast_address: 10.10.10.10
broadcast_rpc_address: 10.10.10.10

where 10.10.10.10=node_address=pod_address.

Still K8Services are sent back to client (cassandra-test) and the above error appears.

Oct 12 '20 18:10 nbenaglia

@nbenaglia setting brodcast_address and broadcast_rpc_address in config map doesn't have any effect since these parameters are configured by the Operator via CLI which has preference over config file.

Both of these values are set to spec.StaticIP of member service in front of each Scylla. This is wrong for externally available services. There should be a switch in ScyllaCluster which would tell Operator whether Scylla should be exposed outside or stay internal, this way it should set broadcast addresses to one of the external IPs available.

Moreno explained why we have multiple services. In addition, these services are needed for drivers which establish connections to each cluster node independently in order to support shard awareness.

Oct 19 '20 14:10 zimnx

Some background on why we have a Service per Member: https://github.com/rook/rook/blob/9b40f17ec49bedfc272e3f100896527155a481e2/design/cassandra/design.md#major-pain-point-stable-pod-identity

Essentially, they enable us to guarrantee stable IPs and not care about bookkeeping IPs and UUIDs. In practice, we've seen this method has some issues. Mainly that:

The ClusterIPs are not available outside the cluster.
It's hard to setup multi-region clusters that way. It requires a LoadBalancer per member, which is expensive.

The alternative is to have some bookkeeping on the UUIDs/IPs. Please check the aforementioned design doc, on the possible race condition of member join. Perhaps we could implement a careful protocol for member join in the operator, to work around the fact that the operator doesn't know the member's UUID beforehand. E.g.:

Sidecar notes IP address in etcd before starting, for the particular member.
Sidecar starts and notes UUID in etcd once the member starts.
If the member is lost (lost disk) before the sidecar can note its UUID, it will look for a member with the IP noted in step 1, note its UUID in etcd and assume its UUID if found (with replace_address).

We could introduce additional ScyllaMember CRs, where each Member would do its bookkeeping.

This is more involved but it will allow the operator to work without needing static member IPs. The procedure may contain mistakes as it's off the top of my head.

Oct 19 '20 19:10 yanniszark

In case it's relevant there is a proposal to make Scylla use UUID for nodes instead of IPs

On Mon, Oct 19, 2020 at 12:09 PM Yannis Zarkadas [email protected] wrote:

Some background on why we have a Service per Member: https://github.com/rook/rook/blob/9b40f17ec49bedfc272e3f100896527155a481e2/design/cassandra/design.md#major-pain-point-stable-pod-identity

Essentially, they enable us to guarrantee stable IPs and not care about bookkeeping IPs and UUIDs. In practice, we've seen this method has some issues. Mainly that:

The ClusterIPs are not available outside the cluster.

It's hard to setup multi-region clusters that way. It requires a LoadBalancer per member, which is expensive.

The alternative is to have some bookkeeping on the UUIDs/IPs. Please check the aforementioned design doc, on the possible race condition of member join. Perhaps we could implement some sort of two-phase commit operation for member join in the operator, to work around the fact that the operator doesn't know the member's UUID beforehand. E.g.:

Sidecar notes IP address in etcd before starting, for the particular member.

Sidecar starts and notes UUID in etcd once the member starts.

If the member is lost (lost disk) before the sidecar can note its UUID, it will look for a member with the IP noted in step 1, note its UUID in etcd and assume its UUID if found (with replace_address).

This is more involved but it will allow the operator to work without needing static member IPs. The procedure may contain mistakes as it's off the top of my head.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scylladb/scylla-operator/issues/196#issuecomment-712383519, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANHURM727FPO2VFLOUCEXTSLSFHPANCNFSM4SMTU2BA .

Oct 19 '20 19:10 dorlaor

Same problem here. I deploy scylla in GKE, and I want my processes can access scylla nodes within the same VPC. But the broadcast-rpc-address is set to ClusterIP by the scylla-operator, which is not accessible from the outside of cluster. Can we just set it to PodIP ?

Nov 17 '21 07:11 rueian

Hi all, this is a pretty big blocker for us and we would be happy to contribute the implementation if you would provide some pointers to what you had in mind for how this could work in the operator.

Jan 11 '23 22:01 iravid

Exposing to external networks is possible from Scylla Operator 1.11.0. https://operator.docs.scylladb.com/stable/exposing.html

Nov 09 '23 12:11 zimnx

scylla-operator scylla-operator copied to clipboard

K8S services as rpc_address causes ConnectionException for clients outside the K8S cluster.

scylla-operator
scylla-operator copied to clipboard