k8ssandra-operator icon indicating copy to clipboard operation
k8ssandra-operator copied to clipboard

DB Down After Node Restart

Open dpaks opened this issue 2 years ago • 3 comments

What happened? I have a single node microk8s cluster with cassandra. Every time I restart node, it will cause cassandra sts to be in non-ready state. Same case with stargate. Did you expect to see something different? A node restart should be seamless. Everything should start up in a few minutes.

How to reproduce it (as minimally and precisely as possible): Restart node Environment microk8s

Linux 5.4.0-148-generic #165-Ubuntu SMP Tue Apr 18 08:53:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

  • K8ssandra Operator version:

    Insert image tag or Git SHA here

docker.io/k8ssandra/k8ssandra-operator:v1.6.1 docker.io/k8ssandra/cass-operator:v1.15.0

  • Kubernetes version information:

    kubectl version

  • Kubernetes cluster kind:

microk8s

Kustomize Version: v5.0.1
Server Version: v1.27.0
  • Manifests:
apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: dc1
  namespace: k8ssandra-operator
spec:
  cassandra:
    serverVersion: "4.0.1"
    datacenters:
      - metadata:
          name: dc1
        size: 1
        storageConfig:
          cassandraDataVolumeClaimSpec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 10Gi
        config:
          jvmOptions:
            heapSize: 512M
        stargate:
          size: 1
          heapSize: 256M
          affinity:
            podAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                - labelSelector:
                    matchExpressions:
                      - key: cassandra.datastax.com/cluster
                        operator: In
                        values:
                          - dc1
                  topologyKey: kubernetes.io/hostname
  • K8ssandra Operator Logs:
2023-05-12T15:44:21.221Z	INFO	setup	watch namespace configured	{"namespace": "k8ssandra-operator"}
2023-05-12T15:44:21.221Z	INFO	setup	watch namespace configured	{"namespace": "k8ssandra-operator"}
2023-05-12T15:44:21.424Z	INFO	controller-runtime.metrics	Metrics server is starting to listen	{"addr": ":8080"}
2023-05-12T15:44:21.629Z	DEBUG	Finished initializing 0 client configs
2023-05-12T15:44:21.630Z	INFO	controller-runtime.builder	Registering a mutating webhook	{"GVK": "k8ssandra.io/v1alpha1, Kind=K8ssandraCluster", "path": "/mutate-k8ssandra-io-v1alpha1-k8ssandracluster"}
2023-05-12T15:44:21.630Z	INFO	controller-runtime.webhook	Registering webhook	{"path": "/mutate-k8ssandra-io-v1alpha1-k8ssandracluster"}
2023-05-12T15:44:21.630Z	INFO	controller-runtime.builder	Registering a validating webhook	{"GVK": "k8ssandra.io/v1alpha1, Kind=K8ssandraCluster", "path": "/validate-k8ssandra-io-v1alpha1-k8ssandracluster"}
2023-05-12T15:44:21.630Z	INFO	controller-runtime.webhook	Registering webhook	{"path": "/validate-k8ssandra-io-v1alpha1-k8ssandracluster"}
2023-05-12T15:44:21.633Z	INFO	controller-runtime.builder	skip registering a mutating webhook, object does not implement admission.Defaulter or WithDefaulter wasn't called	{"GVK": "medusa.k8ssandra.io/v1alpha1, Kind=MedusaBackupSchedule"}
2023-05-12T15:44:21.633Z	INFO	controller-runtime.builder	Registering a validating webhook	{"GVK": "medusa.k8ssandra.io/v1alpha1, Kind=MedusaBackupSchedule", "path": "/validate-medusa-k8ssandra-io-v1alpha1-medusabackupschedule"}
2023-05-12T15:44:21.633Z	INFO	controller-runtime.webhook	Registering webhook	{"path": "/validate-medusa-k8ssandra-io-v1alpha1-medusabackupschedule"}
2023-05-12T15:44:21.633Z	INFO	controller-runtime.webhook	Registering webhook	{"path": "/mutate-v1-pod-secrets-inject"}
2023-05-12T15:44:21.633Z	INFO	setup	starting manager
2023-05-12T15:44:21.633Z	INFO	controller-runtime.webhook.webhooks	Starting webhook server
2023-05-12T15:44:21.633Z	INFO	Starting server	{"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
2023-05-12T15:44:21.633Z	INFO	controller-runtime.certwatcher	Updated current TLS certificate
2023-05-12T15:44:21.633Z	INFO	Starting EventSource	{"controller": "clientconfig", "controllerGroup": "config.k8ssandra.io", "controllerKind": "ClientConfig", "source": "kind source: *v1beta1.ClientConfig"}
2023-05-12T15:44:21.633Z	INFO	Starting EventSource	{"controller": "replicatedsecret", "controllerGroup": "replication.k8ssandra.io", "controllerKind": "ReplicatedSecret", "source": "kind source: *v1alpha1.ReplicatedSecret"}
2023-05-12T15:44:21.633Z	INFO	Starting EventSource	{"controller": "clientconfig", "controllerGroup": "config.k8ssandra.io", "controllerKind": "ClientConfig", "source": "kind source: *v1.Secret"}
2023-05-12T15:44:21.633Z	INFO	Starting Controller	{"controller": "clientconfig", "controllerGroup": "config.k8ssandra.io", "controllerKind": "ClientConfig"}
2023-05-12T15:44:21.633Z	INFO	Starting EventSource	{"controller": "replicatedsecret", "controllerGroup": "replication.k8ssandra.io", "controllerKind": "ReplicatedSecret", "source": "kind source: *v1.Secret"}
2023-05-12T15:44:21.634Z	INFO	Starting Controller	{"controller": "replicatedsecret", "controllerGroup": "replication.k8ssandra.io", "controllerKind": "ReplicatedSecret"}
2023-05-12T15:44:21.634Z	INFO	controller-runtime.webhook	Serving webhook server	{"host": "", "port": 9443}
2023-05-12T15:44:21.634Z	INFO	Starting EventSource	{"controller": "k8ssandracluster", "controllerGroup": "k8ssandra.io", "controllerKind": "K8ssandraCluster", "source": "kind source: *v1alpha1.K8ssandraCluster"}
2023-05-12T15:44:21.634Z	INFO	Starting EventSource	{"controller": "k8ssandracluster", "controllerGroup": "k8ssandra.io", "controllerKind": "K8ssandraCluster", "source": "kind source: *v1beta1.CassandraDatacenter"}
2023-05-12T15:44:21.634Z	INFO	Starting EventSource	{"controller": "k8ssandracluster", "controllerGroup": "k8ssandra.io", "controllerKind": "K8ssandraCluster", "source": "kind source: *v1alpha1.Stargate"}
2023-05-12T15:44:21.634Z	INFO	Starting EventSource	{"controller": "k8ssandracluster", "controllerGroup": "k8ssandra.io", "controllerKind": "K8ssandraCluster", "source": "kind source: *v1alpha1.Reaper"}
2023-05-12T15:44:21.634Z	INFO	Starting EventSource	{"controller": "k8ssandracluster", "controllerGroup": "k8ssandra.io", "controllerKind": "K8ssandraCluster", "source": "kind source: *v1.ConfigMap"}
2023-05-12T15:44:21.634Z	INFO	Starting Controller	{"controller": "k8ssandracluster", "controllerGroup": "k8ssandra.io", "controllerKind": "K8ssandraCluster"}
2023-05-12T15:44:21.634Z	INFO	Starting EventSource	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "source": "kind source: *v1alpha1.Stargate"}
2023-05-12T15:44:21.634Z	INFO	Starting EventSource	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "source": "kind source: *v1.Deployment"}
2023-05-12T15:44:21.634Z	INFO	Starting EventSource	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "source": "kind source: *v1.Service"}
2023-05-12T15:44:21.634Z	INFO	Starting Controller	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate"}
2023-05-12T15:44:21.634Z	INFO	controller-runtime.certwatcher	Starting certificate watcher
2023-05-12T15:44:21.634Z	INFO	Starting EventSource	{"controller": "k8ssandratask", "controllerGroup": "control.k8ssandra.io", "controllerKind": "K8ssandraTask", "source": "kind source: *v1alpha1.K8ssandraTask"}
2023-05-12T15:44:21.634Z	INFO	Starting EventSource	{"controller": "medusatask", "controllerGroup": "medusa.k8ssandra.io", "controllerKind": "MedusaTask", "source": "kind source: *v1alpha1.MedusaTask"}
2023-05-12T15:44:21.634Z	INFO	Starting Controller	{"controller": "medusatask", "controllerGroup": "medusa.k8ssandra.io", "controllerKind": "MedusaTask"}
2023-05-12T15:44:21.634Z	INFO	Starting EventSource	{"controller": "k8ssandratask", "controllerGroup": "control.k8ssandra.io", "controllerKind": "K8ssandraTask", "source": "kind source: *v1alpha1.CassandraTask"}
2023-05-12T15:44:21.634Z	INFO	Starting Controller	{"controller": "k8ssandratask", "controllerGroup": "control.k8ssandra.io", "controllerKind": "K8ssandraTask"}
2023-05-12T15:44:21.634Z	INFO	Starting EventSource	{"controller": "reaper", "controllerGroup": "reaper.k8ssandra.io", "controllerKind": "Reaper", "source": "kind source: *v1alpha1.Reaper"}
2023-05-12T15:44:21.634Z	INFO	Starting EventSource	{"controller": "reaper", "controllerGroup": "reaper.k8ssandra.io", "controllerKind": "Reaper", "source": "kind source: *v1.Deployment"}
2023-05-12T15:44:21.634Z	INFO	Starting EventSource	{"controller": "reaper", "controllerGroup": "reaper.k8ssandra.io", "controllerKind": "Reaper", "source": "kind source: *v1.Service"}
2023-05-12T15:44:21.634Z	INFO	Starting Controller	{"controller": "reaper", "controllerGroup": "reaper.k8ssandra.io", "controllerKind": "Reaper"}
2023-05-12T15:44:21.634Z	INFO	Starting EventSource	{"controller": "medusarestorejob", "controllerGroup": "medusa.k8ssandra.io", "controllerKind": "MedusaRestoreJob", "source": "kind source: *v1alpha1.MedusaRestoreJob"}
2023-05-12T15:44:21.634Z	INFO	Starting Controller	{"controller": "medusarestorejob", "controllerGroup": "medusa.k8ssandra.io", "controllerKind": "MedusaRestoreJob"}
2023-05-12T15:44:21.634Z	INFO	Starting EventSource	{"controller": "medusabackupjob", "controllerGroup": "medusa.k8ssandra.io", "controllerKind": "MedusaBackupJob", "source": "kind source: *v1alpha1.MedusaBackupJob"}
2023-05-12T15:44:21.634Z	INFO	Starting Controller	{"controller": "medusabackupjob", "controllerGroup": "medusa.k8ssandra.io", "controllerKind": "MedusaBackupJob"}
2023-05-12T15:44:21.634Z	INFO	Starting EventSource	{"controller": "medusabackupschedule", "controllerGroup": "medusa.k8ssandra.io", "controllerKind": "MedusaBackupSchedule", "source": "kind source: *v1alpha1.MedusaBackupSchedule"}
2023-05-12T15:44:21.634Z	INFO	Starting Controller	{"controller": "medusabackupschedule", "controllerGroup": "medusa.k8ssandra.io", "controllerKind": "MedusaBackupSchedule"}
2023-05-12T15:44:21.635Z	INFO	Starting server	{"kind": "health probe", "addr": "[::]:8081"}
2023-05-12T15:44:21.734Z	INFO	Starting workers	{"controller": "clientconfig", "controllerGroup": "config.k8ssandra.io", "controllerKind": "ClientConfig", "worker count": 1}
2023-05-12T15:44:21.735Z	INFO	Starting workers	{"controller": "medusarestorejob", "controllerGroup": "medusa.k8ssandra.io", "controllerKind": "MedusaRestoreJob", "worker count": 1}
2023-05-12T15:44:21.736Z	INFO	Starting workers	{"controller": "k8ssandratask", "controllerGroup": "control.k8ssandra.io", "controllerKind": "K8ssandraTask", "worker count": 1}
2023-05-12T15:44:21.736Z	INFO	Starting workers	{"controller": "replicatedsecret", "controllerGroup": "replication.k8ssandra.io", "controllerKind": "ReplicatedSecret", "worker count": 1}
2023-05-12T15:44:21.736Z	INFO	Starting workers	{"controller": "medusabackupjob", "controllerGroup": "medusa.k8ssandra.io", "controllerKind": "MedusaBackupJob", "worker count": 1}
2023-05-12T15:44:21.736Z	INFO	Starting workers	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "worker count": 1}
2023-05-12T15:44:21.736Z	INFO	Starting workers	{"controller": "reaper", "controllerGroup": "reaper.k8ssandra.io", "controllerKind": "Reaper", "worker count": 1}
2023-05-12T15:44:21.736Z	INFO	Starting reconciliation	{"controller": "replicatedsecret", "controllerGroup": "replication.k8ssandra.io", "controllerKind": "ReplicatedSecret", "ReplicatedSecret": {"name":"dc1","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1", "reconcileID": "242264dc-3b68-4231-963c-82937e38f543", "key": {"namespace": "k8ssandra-operator", "name": "dc1"}}
2023-05-12T15:44:21.736Z	INFO	Fetching Stargate resource	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "b0c5cb75-a9bb-4568-bb6d-da45a4b1a476", "Stargate": {"namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate"}}
2023-05-12T15:44:21.736Z	INFO	Starting workers	{"controller": "medusatask", "controllerGroup": "medusa.k8ssandra.io", "controllerKind": "MedusaTask", "worker count": 1}
2023-05-12T15:44:21.736Z	INFO	Starting workers	{"controller": "medusabackupschedule", "controllerGroup": "medusa.k8ssandra.io", "controllerKind": "MedusaBackupSchedule", "worker count": 1}
2023-05-12T15:44:21.736Z	INFO	Starting workers	{"controller": "k8ssandracluster", "controllerGroup": "k8ssandra.io", "controllerKind": "K8ssandraCluster", "worker count": 1}
2023-05-12T15:44:21.736Z	INFO	Fetching CassandraDatacenter resource	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "b0c5cb75-a9bb-4568-bb6d-da45a4b1a476", "CassandraDatacenter": {"namespace": "k8ssandra-operator", "name": "dc1"}}
2023-05-12T15:44:21.736Z	INFO	Reconciling Stargate configmap	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "b0c5cb75-a9bb-4568-bb6d-da45a4b1a476"}
2023-05-12T15:44:21.736Z	INFO	Reconciling Stargate Cassandra yaml configMap on namespace k8ssandra-operator for cluster dc1 and dc dc1	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "b0c5cb75-a9bb-4568-bb6d-da45a4b1a476"}
2023-05-12T15:44:21.736Z	INFO	Stargate ConfigMap successfully reconciled	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "b0c5cb75-a9bb-4568-bb6d-da45a4b1a476", "StargateConfigMap": "k8ssandra-operator/dc1-dc1-cassandra-config"}
2023-05-12T15:44:21.736Z	INFO	Stargate Vector Agent ConfigMap reconciliation complete	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "b0c5cb75-a9bb-4568-bb6d-da45a4b1a476"}
2023-05-12T15:44:21.736Z	INFO	Reconciling Medusa user secrets	{"controller": "k8ssandracluster", "controllerGroup": "k8ssandra.io", "controllerKind": "K8ssandraCluster", "K8ssandraCluster": {"name":"dc1","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1", "reconcileID": "03a2ceb1-f524-4f92-a2fe-584b810660bc", "K8ssandraCluster": "k8ssandra-operator/dc1"}
2023-05-12T15:44:21.737Z	INFO	Medusa user secrets successfully reconciled	{"controller": "k8ssandracluster", "controllerGroup": "k8ssandra.io", "controllerKind": "K8ssandraCluster", "K8ssandraCluster": {"name":"dc1","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1", "reconcileID": "03a2ceb1-f524-4f92-a2fe-584b810660bc", "K8ssandraCluster": "k8ssandra-operator/dc1"}
2023-05-12T15:44:21.737Z	INFO	Reconciling replicated secrets	{"controller": "k8ssandracluster", "controllerGroup": "k8ssandra.io", "controllerKind": "K8ssandraCluster", "K8ssandraCluster": {"name":"dc1","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1", "reconcileID": "03a2ceb1-f524-4f92-a2fe-584b810660bc", "K8ssandraCluster": "k8ssandra-operator/dc1"}
2023-05-12T15:44:21.737Z	INFO	Initial token computation could not be performed or is not required in this cluster	{"controller": "k8ssandracluster", "controllerGroup": "k8ssandra.io", "controllerKind": "K8ssandraCluster", "K8ssandraCluster": {"name":"dc1","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1", "reconcileID": "03a2ceb1-f524-4f92-a2fe-584b810660bc", "K8ssandraCluster": "k8ssandra-operator/dc1", "error": "cannot compute initial tokens: at least one DC has num_tokens >= 16"}
2023-05-12T15:44:21.737Z	INFO	Deleting Stargate desired deployment	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "b0c5cb75-a9bb-4568-bb6d-da45a4b1a476", "Deployment": "dc1-dc1-default-stargate-deployment"}
2023-05-12T15:44:21.737Z	INFO	reconciling telemetry	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "b0c5cb75-a9bb-4568-bb6d-da45a4b1a476", "stargate": "dc1-dc1-stargate"}
2023-05-12T15:44:21.743Z	INFO	Starting reconciliation	{"controller": "replicatedsecret", "controllerGroup": "replication.k8ssandra.io", "controllerKind": "ReplicatedSecret", "ReplicatedSecret": {"name":"dc1","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1", "reconcileID": "7e8b6964-2cb4-437e-b28c-225f41886239", "key": {"namespace": "k8ssandra-operator", "name": "dc1"}}
2023-05-12T15:44:21.838Z	INFO	Medusa reconcile for dc1 on namespace k8ssandra-operator	{"controller": "k8ssandracluster", "controllerGroup": "k8ssandra.io", "controllerKind": "K8ssandraCluster", "K8ssandraCluster": {"name":"dc1","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1", "reconcileID": "03a2ceb1-f524-4f92-a2fe-584b810660bc", "K8ssandraCluster": "k8ssandra-operator/dc1", "CassandraDatacenter": "k8ssandra-operator/dc1", "K8SContext": ""}
2023-05-12T15:44:21.838Z	INFO	Medusa is not enabled	{"controller": "k8ssandracluster", "controllerGroup": "k8ssandra.io", "controllerKind": "K8ssandraCluster", "K8ssandraCluster": {"name":"dc1","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1", "reconcileID": "03a2ceb1-f524-4f92-a2fe-584b810660bc", "K8ssandraCluster": "k8ssandra-operator/dc1", "CassandraDatacenter": "k8ssandra-operator/dc1", "K8SContext": ""}
2023-05-12T15:44:21.838Z	INFO	Vector Agent ConfigMap successfully reconciled	{"controller": "k8ssandracluster", "controllerGroup": "k8ssandra.io", "controllerKind": "K8ssandraCluster", "K8ssandraCluster": {"name":"dc1","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1", "reconcileID": "03a2ceb1-f524-4f92-a2fe-584b810660bc", "K8ssandraCluster": "k8ssandra-operator/dc1", "CassandraDatacenter": "k8ssandra-operator/dc1", "K8SContext": ""}
2023-05-12T15:44:21.841Z	INFO	Reconciling seeds	{"controller": "k8ssandracluster", "controllerGroup": "k8ssandra.io", "controllerKind": "K8ssandraCluster", "K8ssandraCluster": {"name":"dc1","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1", "reconcileID": "03a2ceb1-f524-4f92-a2fe-584b810660bc", "K8ssandraCluster": "k8ssandra-operator/dc1", "CassandraDatacenter": "k8ssandra-operator/dc1", "K8SContext": ""}
I0512 15:44:22.772853       1 request.go:682] Waited for 1.030831033s due to client-side throttling, not priority and fairness, request: GET:https://10.152.183.1:443/apis/cert-manager.io/v1?timeout=32s
2023-05-12T15:44:23.124Z	INFO	Waiting for deployments to be rolled out	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "b0c5cb75-a9bb-4568-bb6d-da45a4b1a476", "Stargate": {"namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate"}}
2023-05-12T15:44:23.225Z	INFO	CassandraDatacenter is being updated. Requeuing the reconcile.	{"controller": "k8ssandracluster", "controllerGroup": "k8ssandra.io", "controllerKind": "K8ssandraCluster", "K8ssandraCluster": {"name":"dc1","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1", "reconcileID": "03a2ceb1-f524-4f92-a2fe-584b810660bc", "K8ssandraCluster": "k8ssandra-operator/dc1", "CassandraDatacenter": "k8ssandra-operator/dc1", "K8SContext": "", "Generation": 2, "ObservedGeneration": 1}
2023-05-12T15:44:23.244Z	INFO	updated k8ssandracluster status	{"controller": "k8ssandracluster", "controllerGroup": "k8ssandra.io", "controllerKind": "K8ssandraCluster", "K8ssandraCluster": {"name":"dc1","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1", "reconcileID": "03a2ceb1-f524-4f92-a2fe-584b810660bc", "K8ssandraCluster": "k8ssandra-operator/dc1"}
2023-05-12T15:44:38.124Z	INFO	Fetching Stargate resource	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "af5bcd73-5d11-4977-8be5-16c3d2e77dad", "Stargate": {"namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate"}}
2023-05-12T15:44:38.124Z	INFO	Fetching CassandraDatacenter resource	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "af5bcd73-5d11-4977-8be5-16c3d2e77dad", "CassandraDatacenter": {"namespace": "k8ssandra-operator", "name": "dc1"}}
2023-05-12T15:44:38.124Z	INFO	Reconciling Stargate configmap	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "af5bcd73-5d11-4977-8be5-16c3d2e77dad"}
2023-05-12T15:44:38.125Z	INFO	Reconciling Stargate Cassandra yaml configMap on namespace k8ssandra-operator for cluster dc1 and dc dc1	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "af5bcd73-5d11-4977-8be5-16c3d2e77dad"}
2023-05-12T15:44:38.125Z	INFO	Stargate ConfigMap successfully reconciled	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "af5bcd73-5d11-4977-8be5-16c3d2e77dad", "StargateConfigMap": "k8ssandra-operator/dc1-dc1-cassandra-config"}
2023-05-12T15:44:38.125Z	INFO	Stargate Vector Agent ConfigMap reconciliation complete	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "af5bcd73-5d11-4977-8be5-16c3d2e77dad"}
2023-05-12T15:44:38.125Z	INFO	Deleting Stargate desired deployment	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "af5bcd73-5d11-4977-8be5-16c3d2e77dad", "Deployment": "dc1-dc1-default-stargate-deployment"}
2023-05-12T15:44:38.125Z	INFO	reconciling telemetry	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "af5bcd73-5d11-4977-8be5-16c3d2e77dad", "stargate": "dc1-dc1-stargate"}
2023-05-12T15:44:38.328Z	INFO	Waiting for deployments to be rolled out	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "af5bcd73-5d11-4977-8be5-16c3d2e77dad", "Stargate": {"namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate"}}
2023-05-12T15:44:53.329Z	INFO	Fetching Stargate resource	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "76d69b33-df4c-44c8-9924-f8eeefc17d16", "Stargate": {"namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate"}}
2023-05-12T15:44:53.329Z	INFO	Fetching CassandraDatacenter resource	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "76d69b33-df4c-44c8-9924-f8eeefc17d16", "CassandraDatacenter": {"namespace": "k8ssandra-operator", "name": "dc1"}}
2023-05-12T15:44:53.329Z	INFO	Reconciling Stargate configmap	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "76d69b33-df4c-44c8-9924-f8eeefc17d16"}
2023-05-12T15:44:53.329Z	INFO	Reconciling Stargate Cassandra yaml configMap on namespace k8ssandra-operator for cluster dc1 and dc dc1	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "76d69b33-df4c-44c8-9924-f8eeefc17d16"}
2023-05-12T15:44:53.329Z	INFO	Stargate ConfigMap successfully reconciled	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "76d69b33-df4c-44c8-9924-f8eeefc17d16", "StargateConfigMap": "k8ssandra-operator/dc1-dc1-cassandra-config"}
2023-05-12T15:44:53.329Z	INFO	Stargate Vector Agent ConfigMap reconciliation complete	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "76d69b33-df4c-44c8-9924-f8eeefc17d16"}
2023-05-12T15:44:53.330Z	INFO	Deleting Stargate desired deployment	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "76d69b33-df4c-44c8-9924-f8eeefc17d16", "Deployment": "dc1-dc1-default-stargate-deployment"}
2023-05-12T15:44:53.330Z	INFO	reconciling telemetry	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "76d69b33-df4c-44c8-9924-f8eeefc17d16", "stargate": "dc1-dc1-stargate"}
2023-05-12T15:44:53.533Z	INFO	Waiting for deployments to be rolled out	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "76d69b33-df4c-44c8-9924-f8eeefc17d16", "Stargate": {"namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate"}}
2023-05-12T15:45:08.533Z	INFO	Fetching Stargate resource	{"controller": "stargate", "controllerGroup": "stargate.k8ssandra.io", "controllerKind": "Stargate", "Stargate": {"name":"dc1-dc1-stargate","namespace":"k8ssandra-operator"}, "namespace": "k8ssandra-operator", "name": "dc1-dc1-stargate", "reconcileID": "7a80d875-8873-44fa-abea-c824a924bf73", "Stargate": {"namespace": "k8ssandra-oper

server-system-logger

2023-05-12T15:14:39.030891Z  INFO vector::app: Internal log rate limit configured. internal_log_rate_secs=10
2023-05-12T15:14:39.040637Z  INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=trace,rdkafka=info,buffers=info,lapin=info,kube=info"
2023-05-12T15:14:39.047118Z  INFO vector::app: Loading configs. paths=["/etc/vector/vector.toml"]
2023-05-12T15:14:39.129848Z  INFO vector::topology::running: Running healthchecks.
2023-05-12T15:14:39.130754Z  INFO vector::topology::builder: Healthcheck passed.
2023-05-12T15:14:39.134847Z  INFO vector: Vector has started. debug="false" version="0.27.0" arch="x86_64" revision="5623d1e 2023-01-18"
2023-05-12T15:14:39.135039Z  INFO vector::app: API is disabled, enable by setting `api.enabled` to `true` and use commands like `vector top`.
2023-05-12T15:14:39.137360Z  INFO source{component_kind="source" component_id=systemlog component_type=file component_name=systemlog}: vector::sources::file: Starting file server. include=["/var/log/cassandra/system.log"] exclude=[]
2023-05-12T15:14:39.138446Z  INFO source{component_kind="source" component_id=systemlog component_type=file component_name=systemlog}:file_server: file_source::checkpointer: Attempting to read legacy checkpoint files.

cassandra logs

Defaulted container "cassandra" out of: cassandra, server-system-logger, server-config-init (init)
Starting Management API
Running java -Xms128m -Xmx128m -jar /opt/management-api/datastax-mgmtapi-server.jar --cassandra-socket /tmp/cassandra.sock --host tcp://0.0.0.0:8080 --host file:///tmp/oss-mgmt.sock --explicit-start true --cassandra-home /opt/cassandra
INFO  [main] 2023-05-12 15:14:40,885 Cli.java:332 - Cassandra Version 4.0.1
INFO  [main] 2023-05-12 15:14:41,303 ResteasyDeploymentImpl.java:657 - RESTEASY002225: Deploying javax.ws.rs.core.Application: class com.datastax.mgmtapi.ManagementApplication
INFO  [main] 2023-05-12 15:14:41,311 ResteasyDeploymentImpl.java:691 - RESTEASY002220: Adding singleton resource com.datastax.mgmtapi.resources.LifecycleResources from Application class com.datastax.mgmtapi.ManagementApplication
INFO  [main] 2023-05-12 15:14:41,311 ResteasyDeploymentImpl.java:691 - RESTEASY002220: Adding singleton resource com.datastax.mgmtapi.resources.K8OperatorResources from Application class com.datastax.mgmtapi.ManagementApplication
INFO  [main] 2023-05-12 15:14:41,312 ResteasyDeploymentImpl.java:691 - RESTEASY002220: Adding singleton resource com.datastax.mgmtapi.resources.KeyspaceOpsResources from Application class com.datastax.mgmtapi.ManagementApplication
INFO  [main] 2023-05-12 15:14:41,312 ResteasyDeploymentImpl.java:691 - RESTEASY002220: Adding singleton resource com.datastax.mgmtapi.resources.v1.KeyspaceOpsResources from Application class com.datastax.mgmtapi.ManagementApplication
INFO  [main] 2023-05-12 15:14:41,312 ResteasyDeploymentImpl.java:691 - RESTEASY002220: Adding singleton resource com.datastax.mgmtapi.resources.MetadataResources from Application class com.datastax.mgmtapi.ManagementApplication
INFO  [main] 2023-05-12 15:14:41,312 ResteasyDeploymentImpl.java:691 - RESTEASY002220: Adding singleton resource com.datastax.mgmtapi.resources.NodeOpsResources from Application class com.datastax.mgmtapi.ManagementApplication
INFO  [main] 2023-05-12 15:14:41,312 ResteasyDeploymentImpl.java:691 - RESTEASY002220: Adding singleton resource com.datastax.mgmtapi.resources.v1.NodeOpsResources from Application class com.datastax.mgmtapi.ManagementApplication
INFO  [main] 2023-05-12 15:14:41,313 ResteasyDeploymentImpl.java:691 - RESTEASY002220: Adding singleton resource com.datastax.mgmtapi.resources.TableOpsResources from Application class com.datastax.mgmtapi.ManagementApplication
INFO  [main] 2023-05-12 15:14:41,313 ResteasyDeploymentImpl.java:691 - RESTEASY002220: Adding singleton resource com.datastax.mgmtapi.resources.v1.TableOpsResources from Application class com.datastax.mgmtapi.ManagementApplication
INFO  [main] 2023-05-12 15:14:41,313 ResteasyDeploymentImpl.java:691 - RESTEASY002220: Adding singleton resource com.datastax.mgmtapi.resources.AuthResources from Application class com.datastax.mgmtapi.ManagementApplication
INFO  [main] 2023-05-12 15:14:41,314 ResteasyDeploymentImpl.java:691 - RESTEASY002220: Adding singleton resource io.swagger.v3.jaxrs2.integration.resources.OpenApiResource from Application class com.datastax.mgmtapi.ManagementApplication
INFO  [main] 2023-05-12 15:14:41,314 ResteasyDeploymentImpl.java:704 - RESTEASY002210: Adding provider singleton io.swagger.v3.jaxrs2.SwaggerSerializers from Application class com.datastax.mgmtapi.ManagementApplication
Started service on tcp://0.0.0.0:8080
INFO  [main] 2023-05-12 15:14:41,601 ResteasyDeploymentImpl.java:657 - RESTEASY002225: Deploying javax.ws.rs.core.Application: class com.datastax.mgmtapi.ManagementApplication
INFO  [main] 2023-05-12 15:14:41,601 ResteasyDeploymentImpl.java:691 - RESTEASY002220: Adding singleton resource com.datastax.mgmtapi.resources.LifecycleResources from Application class com.datastax.mgmtapi.ManagementApplication
INFO  [main] 2023-05-12 15:14:41,603 ResteasyDeploymentImpl.java:691 - RESTEASY002220: Adding singleton resource com.datastax.mgmtapi.resources.K8OperatorResources from Application class com.datastax.mgmtapi.ManagementApplication
INFO  [main] 2023-05-12 15:14:41,603 ResteasyDeploymentImpl.java:691 - RESTEASY002220: Adding singleton resource com.datastax.mgmtapi.resources.KeyspaceOpsResources from Application class com.datastax.mgmtapi.ManagementApplication
INFO  [main] 2023-05-12 15:14:41,603 ResteasyDeploymentImpl.java:691 - RESTEASY002220: Adding singleton resource com.datastax.mgmtapi.resources.v1.KeyspaceOpsResources from Application class com.datastax.mgmtapi.ManagementApplication
INFO  [main] 2023-05-12 15:14:41,603 ResteasyDeploymentImpl.java:691 - RESTEASY002220: Adding singleton resource com.datastax.mgmtapi.resources.MetadataResources from Application class com.datastax.mgmtapi.ManagementApplication
INFO  [main] 2023-05-12 15:14:41,604 ResteasyDeploymentImpl.java:691 - RESTEASY002220: Adding singleton resource com.datastax.mgmtapi.resources.NodeOpsResources from Application class com.datastax.mgmtapi.ManagementApplication
INFO  [main] 2023-05-12 15:14:41,604 ResteasyDeploymentImpl.java:691 - RESTEASY002220: Adding singleton resource com.datastax.mgmtapi.resources.v1.NodeOpsResources from Application class com.datastax.mgmtapi.ManagementApplication
INFO  [main] 2023-05-12 15:14:41,604 ResteasyDeploymentImpl.java:691 - RESTEASY002220: Adding singleton resource com.datastax.mgmtapi.resources.TableOpsResources from Application class com.datastax.mgmtapi.ManagementApplication
INFO  [main] 2023-05-12 15:14:41,604 ResteasyDeploymentImpl.java:691 - RESTEASY002220: Adding singleton resource com.datastax.mgmtapi.resources.v1.TableOpsResources from Application class com.datastax.mgmtapi.ManagementApplication
INFO  [main] 2023-05-12 15:14:41,604 ResteasyDeploymentImpl.java:691 - RESTEASY002220: Adding singleton resource com.datastax.mgmtapi.resources.AuthResources from Application class com.datastax.mgmtapi.ManagementApplication
INFO  [main] 2023-05-12 15:14:41,604 ResteasyDeploymentImpl.java:691 - RESTEASY002220: Adding singleton resource io.swagger.v3.jaxrs2.integration.resources.OpenApiResource from Application class com.datastax.mgmtapi.ManagementApplication
INFO  [main] 2023-05-12 15:14:41,604 ResteasyDeploymentImpl.java:704 - RESTEASY002210: Adding provider singleton io.swagger.v3.jaxrs2.SwaggerSerializers from Application class com.datastax.mgmtapi.ManagementApplication
INFO  [main] 2023-05-12 15:14:41,630 IPCController.java:111 - Starting Server
INFO  [main] 2023-05-12 15:14:41,636 IPCController.java:121 - Started Server
Started service on file:///tmp/oss-mgmt.sock
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:15:04,095 Cli.java:558 - address=/10.1.98.1:53860 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:15:04,097 Cli.java:558 - address=/10.1.98.1:53858 url=/api/v0/probes/liveness status=200 OK
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:15:13,971 Cli.java:558 - address=/10.1.98.1:35720 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:15:18,969 Cli.java:558 - address=/10.1.98.1:53024 url=/api/v0/probes/liveness status=200 OK
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:15:23,968 Cli.java:558 - address=/10.1.98.1:53030 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:15:33,969 Cli.java:558 - address=/10.1.98.1:57362 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:15:33,969 Cli.java:558 - address=/10.1.98.1:57350 url=/api/v0/probes/liveness status=200 OK
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:15:43,970 Cli.java:558 - address=/10.1.98.1:55780 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:15:48,969 Cli.java:558 - address=/10.1.98.1:34278 url=/api/v0/probes/liveness status=200 OK
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:15:53,970 Cli.java:558 - address=/10.1.98.1:34294 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:15:58,592 Cli.java:558 - address=/10.1.98.1:51364 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:16:03,969 Cli.java:558 - address=/10.1.98.1:51382 url=/api/v0/probes/liveness status=200 OK
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:16:03,969 Cli.java:558 - address=/10.1.98.1:51380 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:16:13,969 Cli.java:558 - address=/10.1.98.1:59168 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:16:18,969 Cli.java:558 - address=/10.1.98.1:36952 url=/api/v0/probes/liveness status=200 OK
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:16:23,968 Cli.java:558 - address=/10.1.98.1:36960 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:16:33,969 Cli.java:558 - address=/10.1.98.1:44146 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:16:33,969 Cli.java:558 - address=/10.1.98.1:44144 url=/api/v0/probes/liveness status=200 OK
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:16:43,968 Cli.java:558 - address=/10.1.98.1:47302 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:16:48,968 Cli.java:558 - address=/10.1.98.1:53984 url=/api/v0/probes/liveness status=200 OK
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:16:53,968 Cli.java:558 - address=/10.1.98.1:53990 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:17:03,969 Cli.java:558 - address=/10.1.98.1:43692 url=/api/v0/probes/liveness status=200 OK
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:17:03,969 Cli.java:558 - address=/10.1.98.1:43678 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:17:13,969 Cli.java:558 - address=/10.1.98.1:49568 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:17:18,968 Cli.java:558 - address=/10.1.98.1:34308 url=/api/v0/probes/liveness status=200 OK
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:17:22,594 Cli.java:558 - address=/10.1.98.1:34310 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:17:23,969 Cli.java:558 - address=/10.1.98.1:34316 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:17:33,968 Cli.java:558 - address=/10.1.98.1:37814 url=/api/v0/probes/liveness status=200 OK
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:17:33,970 Cli.java:558 - address=/10.1.98.1:37830 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:17:43,969 Cli.java:558 - address=/10.1.98.1:37210 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:17:48,969 Cli.java:558 - address=/10.1.98.1:43030 url=/api/v0/probes/liveness status=200 OK
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:17:53,969 Cli.java:558 - address=/10.1.98.1:43034 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:18:03,970 Cli.java:558 - address=/10.1.98.1:36216 url=/api/v0/probes/liveness status=200 OK
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:18:03,970 Cli.java:558 - address=/10.1.98.1:36206 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:18:13,970 Cli.java:558 - address=/10.1.98.1:45668 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:18:18,968 Cli.java:558 - address=/10.1.98.1:55884 url=/api/v0/probes/liveness status=200 OK
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:18:23,968 Cli.java:558 - address=/10.1.98.1:55894 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:18:33,968 Cli.java:558 - address=/10.1.98.1:47118 url=/api/v0/probes/liveness status=200 OK
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:18:33,970 Cli.java:558 - address=/10.1.98.1:47134 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:18:43,968 Cli.java:558 - address=/10.1.98.1:47858 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:18:44,591 Cli.java:558 - address=/10.1.98.1:47872 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:18:48,969 Cli.java:558 - address=/10.1.98.1:52512 url=/api/v0/probes/liveness status=200 OK
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:18:53,969 Cli.java:558 - address=/10.1.98.1:52522 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:19:03,969 Cli.java:558 - address=/10.1.98.1:35586 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:19:03,970 Cli.java:558 - address=/10.1.98.1:35574 url=/api/v0/probes/liveness status=200 OK
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:19:13,968 Cli.java:558 - address=/10.1.98.1:58164 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:19:18,969 Cli.java:558 - address=/10.1.98.1:48702 url=/api/v0/probes/liveness status=200 OK
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:19:23,967 Cli.java:558 - address=/10.1.98.1:48708 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:19:33,968 Cli.java:558 - address=/10.1.98.1:39222 url=/api/v0/probes/liveness status=200 OK
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:19:33,971 Cli.java:558 - address=/10.1.98.1:39226 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:19:43,969 Cli.java:558 - address=/10.1.98.1:40916 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:19:48,968 Cli.java:558 - address=/10.1.98.1:52428 url=/api/v0/probes/liveness status=200 OK
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:19:53,968 Cli.java:558 - address=/10.1.98.1:52442 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:19:59,589 Cli.java:558 - address=/10.1.98.1:43914 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:20:03,969 Cli.java:558 - address=/10.1.98.1:43926 url=/api/v0/probes/liveness status=200 OK
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:20:03,972 Cli.java:558 - address=/10.1.98.1:43938 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:20:13,968 Cli.java:558 - address=/10.1.98.1:54652 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:20:18,969 Cli.java:558 - address=/10.1.98.1:35490 url=/api/v0/probes/liveness status=200 OK
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:20:23,968 Cli.java:558 - address=/10.1.98.1:35494 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:20:33,968 Cli.java:558 - address=/10.1.98.1:42372 url=/api/v0/probes/liveness status=200 OK
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:20:33,968 Cli.java:558 - address=/10.1.98.1:42376 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:20:43,968 Cli.java:558 - address=/10.1.98.1:54560 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:20:48,968 Cli.java:558 - address=/10.1.98.1:46260 url=/api/v0/probes/liveness status=200 OK
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:20:53,968 Cli.java:558 - address=/10.1.98.1:46270 url=/api/v0/probes/readiness status=500 Internal Server Error
INFO  [nioEventLoopGroup-2-1] 2023-05-12 15:21:03,970 Cli.java:558 - address=/10.1.98.1:40030 url=/api/v0/probes/liveness status=200 OK
INFO  [nioEventLoopGroup-2-2] 2023-05-12 15:21:03,970 Cli.java:558 - address=/10.1.98.1:40018 url=/api/v0/probes/readiness status=500 Internal Server Error

Anything else we need to know?:

Name:         dc1
Namespace:    k8ssandra-operator
Labels:       <none>
Annotations:  k8ssandra.io/initial-system-replication: {"dc1":1}
API Version:  k8ssandra.io/v1alpha1
Kind:         K8ssandraCluster
Metadata:
  Creation Timestamp:  2023-05-06T08:45:34Z
  Finalizers:
    k8ssandracluster.k8ssandra.io/finalizer
  Generation:        3
  Resource Version:  118628556
  UID:               7f9946f6-1534-4573-8a64-ab6af90e61c5
Spec:
  Auth:  true
  Cassandra:
    Datacenters:
      Config:
        Jvm Options:
          Gc:         G1GC
          Heap Size:  512M
      Jmx Init Container Image:
        Name:      busybox
        Registry:  docker.io
        Tag:       1.34.1
      Metadata:
        Name:                                dc1
      Per Node Config Init Container Image:  mikefarah/yq:4
      Size:                                  2
      Stargate:
        Affinity:
          Pod Affinity:
            Required During Scheduling Ignored During Execution:
              Label Selector:
                Match Expressions:
                  Key:       cassandra.datastax.com/cluster
                  Operator:  In
                  Values:
                    dc1
              Topology Key:            kubernetes.io/hostname
        Allow Stargate On Data Nodes:  false
        Container Image:
          Registry:        docker.io
          Repository:      stargateio
          Tag:             v1.0.67
        Heap Size:         256M
        Secrets Provider:  internal
        Service Account:   default
        Size:              1
      Stopped:             false
      Storage Config:
        Cassandra Data Volume Claim Spec:
          Access Modes:
            ReadWriteOnce
          Resources:
            Requests:
              Storage:  10Gi
    Jmx Init Container Image:
      Name:                                busybox
      Registry:                            docker.io
      Tag:                                 1.34.1
    Per Node Config Init Container Image:  mikefarah/yq:4
    Server Type:                           cassandra
    Server Version:                        4.0.1
    Superuser Secret Ref:
      Name:          dc1-superuser
  Secrets Provider:  internal
Status:
  Conditions:
    Last Transition Time:  2023-05-06T08:46:23Z
    Status:                True
    Type:                  CassandraInitialized
  Datacenters:
    dc1:
      Cassandra:
        Cassandra Operator Progress:  Ready
        Conditions:
          Last Transition Time:    2023-05-06T08:46:17Z
          Message:                 
          Reason:                  
          Status:                  True
          Type:                    Healthy
          Last Transition Time:    2023-05-06T08:46:22Z
          Message:                 
          Reason:                  
          Status:                  False
          Type:                    Stopped
          Last Transition Time:    2023-05-06T08:46:22Z
          Message:                 
          Reason:                  
          Status:                  False
          Type:                    ReplacingNodes
          Last Transition Time:    2023-05-06T08:46:22Z
          Message:                 
          Reason:                  
          Status:                  False
          Type:                    Updating
          Last Transition Time:    2023-05-06T08:46:22Z
          Message:                 
          Reason:                  
          Status:                  False
          Type:                    RollingRestart
          Last Transition Time:    2023-05-06T08:46:22Z
          Message:                 
          Reason:                  
          Status:                  False
          Type:                    Resuming
          Last Transition Time:    2023-05-06T08:46:22Z
          Message:                 
          Reason:                  
          Status:                  False
          Type:                    ScalingDown
          Last Transition Time:    2023-05-06T08:46:22Z
          Message:                 
          Reason:                  
          Status:                  True
          Type:                    Valid
          Last Transition Time:    2023-05-06T08:46:22Z
          Message:                 
          Reason:                  
          Status:                  True
          Type:                    Initialized
          Last Transition Time:    2023-05-06T08:46:22Z
          Message:                 
          Reason:                  
          Status:                  True
          Type:                    Ready
        Datacenter Name:           
        Last Server Node Started:  2023-05-12T17:31:41Z
        Node Statuses:
          dc1-dc1-default-sts-0:
            Host ID:          58412d2d-9a63-426b-96d0-1d3e116239b5
        Observed Generation:  1
        Quiet Period:         2023-05-12T17:39:43Z
        Super User Upserted:  2023-05-12T17:39:38Z
        Users Upserted:       2023-05-12T17:39:38Z
      Stargate:
        Available Replicas:  1
        Conditions:
          Last Transition Time:  2023-05-12T17:33:18Z
          Status:                True
          Type:                  Ready
        Deployment Refs:
          dc1-dc1-default-stargate-deployment
        Progress:              Running
        Ready Replicas:        1
        Ready Replicas Ratio:  1/1
        Replicas:              1
        Service Ref:           dc1-dc1-stargate-service
        Updated Replicas:      1
  Error:                       None
Events:
  Type     Reason           Age                 From                         Message
  ----     ------           ----                ----                         -------
  Warning  Reconcile Error  119m                k8ssandracluster-controller  Timeout: failed waiting for *v1.Endpoints Informer to sync
  Warning  Reconcile Error  118m                k8ssandracluster-controller  Timeout: failed waiting for *v1.Pod Informer to sync
  Warning  Reconcile Error  116m                k8ssandracluster-controller  Timeout: failed waiting for *v1.Pod Informer to sync
  Warning  Reconcile Error  116m                k8ssandracluster-controller  Timeout: failed waiting for *v1.Endpoints Informer to sync
  Warning  Reconcile Error  115m                k8ssandracluster-controller  Timeout: failed waiting for *v1.Pod Informer to sync
  Warning  Reconcile Error  115m                k8ssandracluster-controller  Timeout: failed waiting for *v1.Endpoints Informer to sync
  Warning  Reconcile Error  113m                k8ssandracluster-controller  Timeout: failed waiting for *v1.Endpoints Informer to sync
  Warning  Reconcile Error  111m                k8ssandracluster-controller  Timeout: failed waiting for *v1.Endpoints Informer to sync
  Warning  Reconcile Error  110m                k8ssandracluster-controller  Timeout: failed waiting for *v1.Pod Informer to sync
  Warning  Reconcile Error  108m                k8ssandracluster-controller  Timeout: failed waiting for *v1.Pod Informer to sync
  Warning  Reconcile Error  105m                k8ssandracluster-controller  Timeout: failed waiting for *v1.Pod Informer to sync
  Warning  Reconcile Error  103m                k8ssandracluster-controller  Timeout: failed waiting for *v1.Pod Informer to sync
  Warning  Reconcile Error  103m                k8ssandracluster-controller  Timeout: failed waiting for *v1.Endpoints Informer to sync
  Warning  Reconcile Error  102m                k8ssandracluster-controller  Timeout: failed waiting for *v1.Pod Informer to sync
  Warning  Reconcile Error  97m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Pod Informer to sync
  Warning  Reconcile Error  95m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Pod Informer to sync
  Warning  Reconcile Error  92m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Pod Informer to sync
  Warning  Reconcile Error  91m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Pod Informer to sync
  Warning  Reconcile Error  89m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Endpoints Informer to sync
  Warning  Reconcile Error  86m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Endpoints Informer to sync
  Warning  Reconcile Error  84m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Pod Informer to sync
  Warning  Reconcile Error  81m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Endpoints Informer to sync
  Warning  Reconcile Error  80m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Endpoints Informer to sync
  Warning  Reconcile Error  80m                 k8ssandracluster-controller  k8ssandracluster version check failed: client rate limiter Wait returned an error: context canceled
  Warning  Reconcile Error  78m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Endpoints Informer to sync
  Warning  Reconcile Error  78m                 k8ssandracluster-controller  k8ssandracluster version check failed: client rate limiter Wait returned an error: context canceled
  Warning  Reconcile Error  76m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Endpoints Informer to sync
  Warning  Reconcile Error  75m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Pod Informer to sync
  Warning  Reconcile Error  73m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Pod Informer to sync
  Warning  Reconcile Error  71m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Pod Informer to sync
  Warning  Reconcile Error  70m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Endpoints Informer to sync
  Warning  Reconcile Error  68m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Pod Informer to sync
  Warning  Reconcile Error  67m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Endpoints Informer to sync
  Warning  Reconcile Error  65m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Pod Informer to sync
  Warning  Reconcile Error  64m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Pod Informer to sync
  Warning  Reconcile Error  62m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Pod Informer to sync
  Warning  Reconcile Error  61m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Pod Informer to sync
  Warning  Reconcile Error  61m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Endpoints Informer to sync
  Warning  Reconcile Error  59m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Endpoints Informer to sync
  Warning  Reconcile Error  56m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Endpoints Informer to sync
  Warning  Reconcile Error  56m                 k8ssandracluster-controller  k8ssandracluster version check failed: client rate limiter Wait returned an error: context canceled
  Warning  Reconcile Error  53m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Endpoints Informer to sync
  Warning  Reconcile Error  50m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Endpoints Informer to sync
  Warning  Reconcile Error  47m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Endpoints Informer to sync
  Warning  Reconcile Error  44m                 k8ssandracluster-controller  Timeout: failed waiting for *v1.Pod Informer to sync
  Warning  Reconcile Error  42m (x4 over 42m)   k8ssandracluster-controller  k8ssandracluster version check failed: Internal error occurred: failed calling webhook "vk8ssandracluster.kb.io": failed to call webhook: Post "https://k8ssandra-operator-webhook-service.k8ssandra-operator.svc:443/validate-k8ssandra-io-v1alpha1-k8ssandracluster?timeout=10s": dial tcp 10.152.183.100:443: connect: connection refused
  Warning  Reconcile Error  37m (x14 over 42m)  k8ssandracluster-controller  no pods in READY state found in datacenter dc1

┆Issue is synchronized with this Jira Story by Unito

dpaks avatar May 12 '23 15:05 dpaks

Hi @dpaks, thanks for raising this issue. Can you please confirm a few things:

  1. Can you please provide the StatefulSet status and events?
  2. Can you please provide the provisioning status for the PVCs/PVs attached to the StatefulSet?
  3. Can you please provide more detail on the process you've followed here, especially in relation to how you restarted your microK8s node.

Miles-Garnsey avatar Jun 13 '23 04:06 Miles-Garnsey

Same here, our cassandra instance:

cassandra datacenter
kind: CassandraDatacenter
metadata:
  annotations:
    k8ssandra.io/resource-hash: G+XR9Vyl5INlx/XJ3IFor4uRjeXHgnKnQnpBGZFyvzM=
  creationTimestamp: "2024-03-24T18:12:08Z"
  finalizers:
  - finalizer.cassandra.datastax.com
  generation: 1
  labels:
    app.kubernetes.io/component: cassandra
    app.kubernetes.io/name: k8ssandra-operator
    app.kubernetes.io/part-of: k8ssandra
    k8ssandra.io/cleaned-up-by: k8ssandracluster-controller
    k8ssandra.io/cluster-name: cassandra
    k8ssandra.io/cluster-namespace: cassandra
  name: dc1
  namespace: cassandra
  resourceVersion: "41650"
  uid: a1bbb91d-b6cc-47e5-af72-ca39fa9498f0
spec:
  additionalServiceConfig:
    additionalSeedService: {}
    allpodsService: {}
    dcService: {}
    nodePortService: {}
    seedService: {}
  allowMultipleNodesPerWorker: true
  clusterName: cassandra
  config:
    cassandra-env-sh:
      additional-jvm-opts:
      - -Dcassandra.allow_alter_rf_during_range_movement=true
      - -Dcassandra.system_distributed_replication=dc1:1
      - -Dcassandra.jmx.authorizer=org.apache.cassandra.auth.jmx.AuthorizationProxy
      - -Djava.security.auth.login.config=$CASSANDRA_HOME/conf/cassandra-jaas.config
      - -Dcassandra.jmx.remote.login.config=CassandraLogin
      - -Dcom.sun.management.jmxremote.authenticate=true
    cassandra-yaml:
      authenticator: PasswordAuthenticator
      authorizer: CassandraAuthorizer
      num_tokens: 16
      role_manager: CassandraRoleManager
    jvm-server-options:
      initial_heap_size: 512000000
      max_heap_size: 512000000
    jvm11-server-options:
      garbage_collector: G1GC
  configBuilderResources: {}
  managementApiAuth: {}
  podTemplateSpec:
    metadata:
      annotations:
        k8ssandra.io/inject-secret: '[{"name":"cassandra-password","path":"/etc/secrets/cassandra-password","containers":["cassandra"]},{"name":"cassandra-medusa","path":"/etc/secrets/cassandra-medusa","containers":["medusa","medusa-restore"]}]'
    spec:
      containers:
      - env:
        - name: LOCAL_JMX
          value: "no"
        - name: METRIC_FILTERS
          value: deny:org.apache.cassandra.metrics.Table deny:org.apache.cassandra.metrics.table
            allow:org.apache.cassandra.metrics.table.live_ss_table_count allow:org.apache.cassandra.metrics.Table.LiveSSTableCount
            allow:org.apache.cassandra.metrics.table.live_disk_space_used allow:org.apache.cassandra.metrics.table.LiveDiskSpaceUsed
            allow:org.apache.cassandra.metrics.Table.Pending allow:org.apache.cassandra.metrics.Table.Memtable
            allow:org.apache.cassandra.metrics.Table.Compaction allow:org.apache.cassandra.metrics.table.read
            allow:org.apache.cassandra.metrics.table.write allow:org.apache.cassandra.metrics.table.range
            allow:org.apache.cassandra.metrics.table.coordinator allow:org.apache.cassandra.metrics.table.dropped_mutations
        name: cassandra
        resources: {}
      - env:
        - name: MEDUSA_MODE
          value: GRPC
        - name: MEDUSA_TMP_DIR
          value: /var/lib/cassandra
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: CQL_USERNAME
          valueFrom:
            secretKeyRef:
              key: username
              name: cassandra-medusa
        - name: CQL_PASSWORD
          valueFrom:
            secretKeyRef:
              key: password
              name: cassandra-medusa
        image: docker.io/k8ssandra/medusa:0.19.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          exec:
            command:
            - /bin/grpc_health_probe
            - --addr=:50051
          failureThreshold: 10
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: medusa
        ports:
        - containerPort: 50051
          name: grpc
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - /bin/grpc_health_probe
            - --addr=:50051
          failureThreshold: 10
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            memory: 8Gi
          requests:
            cpu: 100m
            memory: 100Mi
        volumeMounts:
        - mountPath: /etc/cassandra
          name: server-config
        - mountPath: /var/lib/cassandra
          name: server-data
        - mountPath: /etc/medusa
          name: cassandra-medusa
        - mountPath: /etc/podinfo
          name: podinfo
        - mountPath: /etc/medusa-secrets
          name: medusa-password
      initContainers:
      - name: server-config-init
        resources: {}
      - env:
        - name: MEDUSA_MODE
          value: RESTORE
        - name: MEDUSA_TMP_DIR
          value: /var/lib/cassandra
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: CQL_USERNAME
          valueFrom:
            secretKeyRef:
              key: username
              name: cassandra-medusa
        - name: CQL_PASSWORD
          valueFrom:
            secretKeyRef:
              key: password
              name: cassandra-medusa
        image: docker.io/k8ssandra/medusa:0.19.1
        imagePullPolicy: IfNotPresent
        name: medusa-restore
        resources:
          limits:
            memory: 8Gi
          requests:
            cpu: 100m
            memory: 100Mi
        volumeMounts:
        - mountPath: /etc/cassandra
          name: server-config
        - mountPath: /var/lib/cassandra
          name: server-data
        - mountPath: /etc/medusa
          name: cassandra-medusa
        - mountPath: /etc/podinfo
          name: podinfo
        - mountPath: /etc/medusa-secrets
          name: medusa-password
      volumes:
      - configMap:
          name: cassandra-medusa
        name: cassandra-medusa
      - name: medusa-password
        secret:
          secretName: medusa-password
      - downwardAPI:
          items:
          - fieldRef:
              fieldPath: metadata.labels
            path: labels
        name: podinfo
  racks:
  - name: rack1
  resources:
    limits:
      cpu: "4"
      memory: 4Gi
    requests:
      cpu: "4"
      memory: 4Gi
  serverType: cassandra
  serverVersion: 4.0.8
  size: 1
  storageConfig:
    additionalVolumes:
    - mountPath: /opt/management-api/configs
      name: metrics-agent-config
      volumeSource:
        configMap:
          items:
          - key: metrics-collector.yaml
            path: metrics-collector.yaml
          name: cassandra-dc1-metrics-agent-config
    cassandraDataVolumeClaimSpec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 5Gi
      storageClassName: sc-single
  superuserSecretName: cassandra-password
  systemLoggerResources: {}
  users:
  - secretName: cassandra-reaper
    superuser: true
  - secretName: cassandra-medusa
    superuser: true
STS
apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    cassandra.datastax.com/resource-hash: XhNv+3i4YV91Hf2NHyBv5MYm1Ch+ElxlZ7n/Xy1ukWI=
  creationTimestamp: "2024-03-24T18:12:09Z"
  generation: 1
  labels:
    app.kubernetes.io/created-by: cass-operator
    app.kubernetes.io/instance: cassandra-cassandra
    app.kubernetes.io/managed-by: cass-operator
    app.kubernetes.io/name: cassandra
    app.kubernetes.io/version: 4.0.8
    cassandra.datastax.com/cluster: cassandra
    cassandra.datastax.com/datacenter: dc1
    cassandra.datastax.com/rack: rack1
  name: cassandra-dc1-rack1-sts
  namespace: cassandra
  ownerReferences:
  - apiVersion: cassandra.datastax.com/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: CassandraDatacenter
    name: dc1
    uid: a1bbb91d-b6cc-47e5-af72-ca39fa9498f0
  resourceVersion: "42212"
  uid: 64c59125-9611-4bf5-862c-680e94d11263
spec:
  persistentVolumeClaimRetentionPolicy:
    whenDeleted: Retain
    whenScaled: Retain
  podManagementPolicy: Parallel
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      cassandra.datastax.com/cluster: cassandra
      cassandra.datastax.com/datacenter: dc1
      cassandra.datastax.com/rack: rack1
  serviceName: cassandra-dc1-all-pods-service
  template:
    metadata:
      annotations:
        k8ssandra.io/inject-secret: '[{"name":"cassandra-password","path":"/etc/secrets/cassandra-password","containers":["cassandra"]},{"name":"cassandra-medusa","path":"/etc/secrets/cassandra-medusa","containers":["medusa","medusa-restore"]}]'
      creationTimestamp: null
      labels:
        app.kubernetes.io/created-by: cass-operator
        app.kubernetes.io/instance: cassandra-cassandra
        app.kubernetes.io/managed-by: cass-operator
        app.kubernetes.io/name: cassandra
        app.kubernetes.io/version: 4.0.8
        cassandra.datastax.com/cluster: cassandra
        cassandra.datastax.com/datacenter: dc1
        cassandra.datastax.com/node-state: Ready-to-Start
        cassandra.datastax.com/rack: rack1
    spec:
      affinity: {}
      containers:
      - env:
        - name: LOCAL_JMX
          value: "no"
        - name: METRIC_FILTERS
          value: deny:org.apache.cassandra.metrics.Table deny:org.apache.cassandra.metrics.table
            allow:org.apache.cassandra.metrics.table.live_ss_table_count allow:org.apache.cassandra.metrics.Table.LiveSSTableCount
            allow:org.apache.cassandra.metrics.table.live_disk_space_used allow:org.apache.cassandra.metrics.table.LiveDiskSpaceUsed
            allow:org.apache.cassandra.metrics.Table.Pending allow:org.apache.cassandra.metrics.Table.Memtable
            allow:org.apache.cassandra.metrics.Table.Compaction allow:org.apache.cassandra.metrics.table.read
            allow:org.apache.cassandra.metrics.table.write allow:org.apache.cassandra.metrics.table.range
            allow:org.apache.cassandra.metrics.table.coordinator allow:org.apache.cassandra.metrics.table.dropped_mutations
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: DS_LICENSE
          value: accept
        - name: DSE_AUTO_CONF_OFF
          value: all
        - name: USE_MGMT_API
          value: "true"
        - name: MGMT_API_EXPLICIT_START
          value: "true"
        - name: DSE_MGMT_EXPLICIT_START
          value: "true"
        image: cr.k8ssandra.io/k8ssandra/cass-management-api:4.0.8
        imagePullPolicy: IfNotPresent
        lifecycle:
          preStop:
            exec:
              command:
              - curl
              - -X
              - POST
              - -s
              - -m
              - "0"
              - -o
              - /dev/null
              - --show-error
              - --fail
              - http://localhost:8080/api/v0/ops/node/drain
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /api/v0/probes/liveness
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 15
          periodSeconds: 15
          successThreshold: 1
          timeoutSeconds: 10
        name: cassandra
        ports:
        - containerPort: 9042
          name: native
          protocol: TCP
        - containerPort: 9142
          name: tls-native
          protocol: TCP
        - containerPort: 7000
          name: internode
          protocol: TCP
        - containerPort: 7001
          name: tls-internode
          protocol: TCP
        - containerPort: 7199
          name: jmx
          protocol: TCP
        - containerPort: 8080
          name: mgmt-api-http
          protocol: TCP
        - containerPort: 9103
          name: prometheus
          protocol: TCP
        - containerPort: 9000
          name: metrics
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /api/v0/probes/readiness
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 20
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 10
        resources:
          limits:
            cpu: "4"
            memory: 4Gi
          requests:
            cpu: "4"
            memory: 4Gi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /opt/management-api/configs
          name: metrics-agent-config
        - mountPath: /var/log/cassandra
          name: server-logs
        - mountPath: /var/lib/cassandra
          name: server-data
        - mountPath: /config
          name: server-config
      - env:
        - name: MEDUSA_MODE
          value: GRPC
        - name: MEDUSA_TMP_DIR
          value: /var/lib/cassandra
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: CQL_USERNAME
          valueFrom:
            secretKeyRef:
              key: username
              name: cassandra-medusa
        - name: CQL_PASSWORD
          valueFrom:
            secretKeyRef:
              key: password
              name: cassandra-medusa
        image: docker.io/k8ssandra/medusa:0.19.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          exec:
            command:
            - /bin/grpc_health_probe
            - --addr=:50051
          failureThreshold: 10
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: medusa
        ports:
        - containerPort: 50051
          name: grpc
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - /bin/grpc_health_probe
            - --addr=:50051
          failureThreshold: 10
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            memory: 8Gi
          requests:
            cpu: 100m
            memory: 100Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/cassandra
          name: server-config
        - mountPath: /var/lib/cassandra
          name: server-data
        - mountPath: /etc/medusa
          name: cassandra-medusa
        - mountPath: /etc/podinfo
          name: podinfo
        - mountPath: /etc/medusa-secrets
          name: medusa-password
      - env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: CLUSTER_NAME
          value: cassandra
        - name: DATACENTER_NAME
          value: dc1
        - name: RACK_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.labels['cassandra.datastax.com/rack']
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        image: cr.k8ssandra.io/k8ssandra/system-logger:v1.19.0
        imagePullPolicy: IfNotPresent
        name: server-system-logger
        resources:
          limits:
            memory: 128M
          requests:
            cpu: 100m
            memory: 64M
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /opt/management-api/configs
          name: metrics-agent-config
        - mountPath: /var/log/cassandra
          name: server-logs
        - mountPath: /var/lib/vector
          name: vector-lib
      dnsPolicy: ClusterFirst
      initContainers:
      - env:
        - name: POD_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        - name: HOST_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
        - name: USE_HOST_IP_FOR_BROADCAST
          value: "false"
        - name: RACK_NAME
          value: rack1
        - name: PRODUCT_VERSION
          value: 4.0.8
        - name: PRODUCT_NAME
          value: cassandra
        - name: CONFIG_FILE_DATA
          value: '{"cassandra-env-sh":{"additional-jvm-opts":["-Dcassandra.allow_alter_rf_during_range_movement=true","-Dcassandra.system_distributed_replication=dc1:1","-Dcassandra.jmx.authorizer=org.apache.cassandra.auth.jmx.AuthorizationProxy","-Djava.security.auth.login.config=$CASSANDRA_HOME/conf/cassandra-jaas.config","-Dcassandra.jmx.remote.login.config=CassandraLogin","-Dcom.sun.management.jmxremote.authenticate=true"]},"cassandra-yaml":{"authenticator":"PasswordAuthenticator","authorizer":"CassandraAuthorizer","num_tokens":16,"role_manager":"CassandraRoleManager"},"cluster-info":{"name":"cassandra","seeds":"cassandra-seed-service,cassandra-dc1-additional-seed-service"},"datacenter-info":{"graph-enabled":0,"name":"dc1","solr-enabled":0,"spark-enabled":0},"jvm-server-options":{"initial_heap_size":512000000,"max_heap_size":512000000},"jvm11-server-options":{"garbage_collector":"G1GC"}}'
        image: cr.dtsx.io/datastax/cass-config-builder:1.0-ubi8
        imagePullPolicy: IfNotPresent
        name: server-config-init
        resources:
          limits:
            cpu: "1"
            memory: 384M
          requests:
            cpu: "1"
            memory: 256M
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /config
          name: server-config
      - env:
        - name: MEDUSA_MODE
          value: RESTORE
        - name: MEDUSA_TMP_DIR
          value: /var/lib/cassandra
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: CQL_USERNAME
          valueFrom:
            secretKeyRef:
              key: username
              name: cassandra-medusa
        - name: CQL_PASSWORD
          valueFrom:
            secretKeyRef:
              key: password
              name: cassandra-medusa
        image: docker.io/k8ssandra/medusa:0.19.1
        imagePullPolicy: IfNotPresent
        name: medusa-restore
        resources:
          limits:
            memory: 8Gi
          requests:
            cpu: 100m
            memory: 100Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/cassandra
          name: server-config
        - mountPath: /var/lib/cassandra
          name: server-data
        - mountPath: /etc/medusa
          name: cassandra-medusa
        - mountPath: /etc/podinfo
          name: podinfo
        - mountPath: /etc/medusa-secrets
          name: medusa-password
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 999
        runAsGroup: 999
        runAsUser: 999
      terminationGracePeriodSeconds: 120
      volumes:
      - configMap:
          defaultMode: 420
          name: cassandra-medusa
        name: cassandra-medusa
      - name: medusa-password
        secret:
          defaultMode: 420
          secretName: medusa-password
      - downwardAPI:
          defaultMode: 420
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.labels
            path: labels
        name: podinfo
      - emptyDir: {}
        name: server-config
      - emptyDir: {}
        name: server-logs
      - emptyDir: {}
        name: vector-lib
      - configMap:
          defaultMode: 420
          items:
          - key: metrics-collector.yaml
            path: metrics-collector.yaml
          name: cassandra-dc1-metrics-agent-config
        name: metrics-agent-config
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      annotations:
        cassandra.datastax.com/resource-hash: XhNv+3i4YV91Hf2NHyBv5MYm1Ch+ElxlZ7n/Xy1ukWI=
      creationTimestamp: null
      labels:
        app.kubernetes.io/created-by: cass-operator
        app.kubernetes.io/instance: cassandra-cassandra
        app.kubernetes.io/managed-by: cass-operator
        app.kubernetes.io/name: cassandra
        app.kubernetes.io/version: 4.0.8
        cassandra.datastax.com/cluster: cassandra
        cassandra.datastax.com/datacenter: dc1
        cassandra.datastax.com/rack: rack1
      name: server-data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 5Gi
      storageClassName: sc-single
      volumeMode: Filesystem
    status:
      phase: Pending
status:
  availableReplicas: 0
  collisionCount: 0
  currentReplicas: 1
  currentRevision: cassandra-dc1-rack1-sts-5bbcc95d9d
  observedGeneration: 1
  replicas: 1
  updateRevision: cassandra-dc1-rack1-sts-5bbcc95d9d
  updatedReplicas: 1

In my case, the issue seems to be sometimes resolved in ~20-40 minutes(but this isn't consistent). We use microk8s hostpath as a PV provisioned

I haven't done a graceful node restart, simulating total power loss.

L1ghtman2k avatar Mar 24 '24 20:03 L1ghtman2k

Update: It seems like in my case cassandra is only fixed if I reapply the Cassandra cluster CR (even if it is identical)

L1ghtman2k avatar Mar 25 '24 17:03 L1ghtman2k