scylla-operator
scylla-operator copied to clipboard
Errors like `alternator: get node info: no host config available` and `CQL: no host config available` when running `sctools status` after an update
What happened?
After an update of Scylla from 5.2.9 to 5.4.7, Scylla Operator from 1.9.x to 1.12.2 (latest that supports Scylla 5.2.x and 5.4.x), Scylla Manager from 3.1.x to 3.2.8, we started to observe that sctool status
doesn't provide all the node info anymore and returns errors:
$ kubectl exec -it deployments/scylla-manager -n scylla-manager -- sctool status --cluster scylla/scylla
Datacenter: XXX
+----+-------------+-------------+----------+--------------+--------+------+--------+--------+-------+--------------------------------------+
| | Alternator | CQL | REST | Address | Uptime | CPUs | Memory | Scylla | Agent | Host ID |
+----+-------------+-------------+----------+--------------+--------+------+--------+--------+-------+--------------------------------------+
| UN | ERROR (0ms) | ERROR (0ms) | UP (0ms) | 10.7.241.130 | - | - | - | - | - | 8a24c600-5525-490e-a3cd-314f6062d6a1 |
| UN | ERROR (0ms) | ERROR (0ms) | UP (6ms) | 10.7.241.174 | - | - | - | - | - | f14fcd59-8d90-4d8e-af22-ace87ceced22 |
| UN | ERROR (0ms) | ERROR (0ms) | UP (1ms) | 10.7.241.175 | - | - | - | - | - | 050dcc67-7bb8-4d5d-89b1-5dbe0bcbb8b2 |
| UN | ERROR (0ms) | ERROR (0ms) | UP (5ms) | 10.7.243.109 | - | - | - | - | - | 4a3ff045-bba2-4537-a4d7-a213d25ae713 |
| UN | ERROR (0ms) | ERROR (0ms) | UP (1ms) | 10.7.248.124 | - | - | - | - | - | 028023f5-9d4e-404c-8537-467ac3d4538c |
| UN | ERROR (0ms) | ERROR (0ms) | UP (1ms) | 10.7.249.238 | - | - | - | - | - | b8f68c62-c462-4a30-a505-5ece9ae1ab0b |
| UN | ERROR (0ms) | ERROR (0ms) | UP (0ms) | 10.7.252.229 | - | - | - | - | - | 1ff1b8df-7a90-4321-a309-7cd69e20bd70 |
+----+-------------+-------------+----------+--------------+--------+------+--------+--------+-------+--------------------------------------+
Errors:
- 10.7.241.130 alternator: get node info: no host config available
- 10.7.241.130 CQL: no host config available
- 10.7.241.174 alternator: get node info: no host config available
- 10.7.241.174 CQL: no host config available
- 10.7.241.175 alternator: get node info: no host config available
- 10.7.241.175 CQL: no host config available
- 10.7.243.109 alternator: get node info: no host config available
- 10.7.243.109 CQL: no host config available
- 10.7.248.124 alternator: get node info: no host config available
- 10.7.248.124 CQL: no host config available
- 10.7.249.238 alternator: get node info: no host config available
- 10.7.249.238 CQL: no host config available
- 10.7.252.229 alternator: get node info: no host config available
- 10.7.252.229 CQL: no host config available
Note that our scylla.yaml
didn't have any config for TLS up to that point.
This problem has been worked around by setting this:
client_encryption_options:
optional: true
However, we still have a problem with the Scylla Manager's cluster:
$ kubectl exec -it deployments/scylla-manager -n scylla-manager -- sctool status --cluster scylla-manager/scylla-manager
Datacenter: manager-dc
+----+-------------+-------------+-----------+--------------+--------+------+--------+--------+-------+--------------------------------------+
| | Alternator | CQL | REST | Address | Uptime | CPUs | Memory | Scylla | Agent | Host ID |
+----+-------------+-------------+-----------+--------------+--------+------+--------+--------+-------+--------------------------------------+
| UN | ERROR (0ms) | ERROR (0ms) | UP (92ms) | 10.7.255.190 | - | - | - | - | - | 8ec8a729-8225-4278-a9da-ad0f23f47e01 |
+----+-------------+-------------+-----------+--------------+--------+------+--------+--------+-------+--------------------------------------+
Errors:
- 10.7.255.190 alternator: get node info: no host config available
- 10.7.255.190 CQL: no host config available
...and it seems to only have a generated ConfigMap named scylladb-managed-config:
apiVersion: v1
data:
scylladb-managed-config.yaml: |
cluster_name: "scylla"
rpc_address: "0.0.0.0"
endpoint_snitch: "GossipingPropertyFileSnitch"
internode_compression: "all"
native_transport_port_ssl: 9142
native_shard_aware_transport_port_ssl: 19142
client_encryption_options:
enabled: true
optional: false
certificate: "/var/run/secrets/scylla-operator.scylladb.com/scylladb/serving-certs/tls.crt"
keyfile: "/var/run/secrets/scylla-operator.scylladb.com/scylladb/serving-certs/tls.key"
require_client_auth: true
truststore: "/var/run/secrets/scylla-operator.scylladb.com/scylladb/client-ca/tls.crt"
kind: ConfigMap
metadata:
annotations:
meta.helm.sh/release-name: scylla
meta.helm.sh/release-namespace: scylla
scylla-operator.scylladb.com/managed-hash: <redacted>
==
creationTimestamp: "<redacted>"
labels:
app.kubernetes.io/managed-by: Helm
scylla/cluster: scylla
name: scylla-managed-config
namespace: scylla
ownerReferences:
- apiVersion: scylla.scylladb.com/v1
blockOwnerDeletion: true
controller: true
kind: ScyllaCluster
name: scylla
uid: <redacted>
resourceVersion: "<redacted>"
uid: <redacted>
...and I can't find anything about modifying it in the https://operator.docs.scylladb.com/stable/helm.html...
Since then we have updated Scylla to 5.4.9, Operator to 1.13.0, and Manager to 3.3.0 but it did not help.
What did you expect to happen?
sctool status
should work without errors for both main cluster as well as Scylla Manager's one after an update.
I shouldn't have to reconfigure TLS as the defaults shown in https://github.com/scylladb/scylladb/blob/scylla-5.4.7/conf/scylla.yaml#L472-L474 say that it should be disabled.
How can we reproduce it (as minimally and precisely as possible)?
- Set up versions like mentioned above
- Use this
scylla.yaml
, as we had before:
read_request_timeout_in_ms: 5000
write_request_timeout_in_ms: 2000
cas_contention_timeout_in_ms: 1000
consistent_cluster_management: true
- Update to the versions mentioned above
- Check
sctool status
Scylla Operator version
1.13.0
Kubernetes platform name and version
$ kubectl version
Client Version: v1.29.6
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.5-gke.1192000
Please attach the must-gather archive.
scylla-operator-must-gather-77t6kvnghzss.zip
Anything else we need to know?
The must-gather archive has been anonymized additionally by me manually, see https://github.com/scylladb/scylla-operator/issues/2015.
This problem has originally been reported here https://github.com/scylladb/scylla-manager/issues/3889, but that issue was originally about a (probably?) different problem, so I was suggested to create a new one.