Adding or removing Reaper or Stargate to or from a single-node cluster to lose readiness
What happened? I deployed a K8ssandraCluster with a single Cassandra node. After the Cassandra pod became ready, I enabled Reaper. This causes a restart of the Cassandra pod. Cassandra fails startup with these messages in the logs:
WARN [main] 2022-02-17 14:14:42,212 K8SeedProvider4x.java:58 - Seed provider couldn't lookup host test-seed-service
WARN [main] 2022-02-17 14:14:42,222 K8SeedProvider4x.java:58 - Seed provider couldn't lookup host test-dc1-additional-seed-service
ERROR [main] 2022-02-17 14:14:42,225 CassandraDaemon.java:909 - Exception encountered during startup: The seed provider lists no seeds
When the Cassandra pod is restarted it no longer has the cassandra.datastax.com/seed-node: "true" label which causes the error message in the logs.
cass-operator doesn't apply the seed label again, so we get stuck. Fortunately, there are a couple easy work arounds. You can delete the pod again. cass-operator should then apply the seed label. Alternatively you can edit/patch the pod and the seed label.
This behavior occurs when enabling or disabling Reaper or Stargate when there is only a single Cassandra pod.
Did you expect to see something different? The seed label should get applied again so Cassandra starts up.
How to reproduce it (as minimally and precisely as possible): Create this K8ssandraCluster:
apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
name: test
spec:
auth: false
cassandra:
serverVersion: "4.0.1"
storageConfig:
cassandraDataVolumeClaimSpec:
storageClassName: standard
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
config:
jvmOptions:
heapSize: 512Mi
networking:
hostNetwork: true
datacenters:
- metadata:
name: dc1
#stargate:
#size: 1
#heapSize: 512Mi
#resources:
#limits:
#memory: 1024Mi
After the Cassandra pod is ready, enable Stargate to observe the issue.
Environment
-
K8ssandra Operator version:
v1.0.0
┆Issue is synchronized with this Jira Story by Unito ┆Issue Number: K8OP-136
This issue seem to be triggered by the operation involving the C* pod restart, including the configuration change. Reproduces on the local setup up on sleep/wake as C* wakes up with a node down.
move message to separate issue - https://github.com/k8ssandra/k8ssandra-operator/issues/567
Have you tested with v1.3.0 to see if this is still an issue?