k8ssandra-operator icon indicating copy to clipboard operation
k8ssandra-operator copied to clipboard

Adding or removing Reaper or Stargate to or from a single-node cluster to lose readiness

Open jsanda opened this issue 3 years ago • 3 comments

What happened? I deployed a K8ssandraCluster with a single Cassandra node. After the Cassandra pod became ready, I enabled Reaper. This causes a restart of the Cassandra pod. Cassandra fails startup with these messages in the logs:

WARN  [main] 2022-02-17 14:14:42,212 K8SeedProvider4x.java:58 - Seed provider couldn't lookup host test-seed-service
WARN  [main] 2022-02-17 14:14:42,222 K8SeedProvider4x.java:58 - Seed provider couldn't lookup host test-dc1-additional-seed-service
ERROR [main] 2022-02-17 14:14:42,225 CassandraDaemon.java:909 - Exception encountered during startup: The seed provider lists no seeds

When the Cassandra pod is restarted it no longer has the cassandra.datastax.com/seed-node: "true" label which causes the error message in the logs.

cass-operator doesn't apply the seed label again, so we get stuck. Fortunately, there are a couple easy work arounds. You can delete the pod again. cass-operator should then apply the seed label. Alternatively you can edit/patch the pod and the seed label.

This behavior occurs when enabling or disabling Reaper or Stargate when there is only a single Cassandra pod.

Did you expect to see something different? The seed label should get applied again so Cassandra starts up.

How to reproduce it (as minimally and precisely as possible): Create this K8ssandraCluster:

apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: test
spec:
  auth: false
  cassandra:
    serverVersion: "4.0.1"
    storageConfig:
      cassandraDataVolumeClaimSpec:
        storageClassName: standard
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
    config:
      jvmOptions:
        heapSize: 512Mi
    networking:
      hostNetwork: true
    datacenters:
      - metadata:
          name: dc1
  #stargate:
    #size: 1
    #heapSize: 512Mi
    #resources:
      #limits:
        #memory: 1024Mi

After the Cassandra pod is ready, enable Stargate to observe the issue.

Environment

  • K8ssandra Operator version:

    v1.0.0

┆Issue is synchronized with this Jira Story by Unito ┆Issue Number: K8OP-136

jsanda avatar Feb 18 '22 04:02 jsanda

This issue seem to be triggered by the operation involving the C* pod restart, including the configuration change. Reproduces on the local setup up on sleep/wake as C* wakes up with a node down.

andrey-dubnik avatar May 20 '22 07:05 andrey-dubnik

move message to separate issue - https://github.com/k8ssandra/k8ssandra-operator/issues/567

Tom910 avatar Jun 14 '22 11:06 Tom910

Have you tested with v1.3.0 to see if this is still an issue?

jsanda avatar Oct 25 '22 22:10 jsanda