cassandra icon indicating copy to clipboard operation
cassandra copied to clipboard

"OOMKilled" and "CrashLoopBackOff" errors deploying Cassandra on Minikube

Open learner00000 opened this issue 9 months ago • 1 comments

Previously I had 32GB RAM on my laptop and could deploy Cassandra with the following .yaml file and no issues on a 3-nodes Minikube cluster, each node had an 8GB RAM:

apiVersion: v1
kind: Service
metadata:
  name: cassandra-srv
spec:
  type: NodePort 
  ports:
    - port: 9042  
      targetPort: 9042  
      nodePort: 30042  
  selector:
    app: cassandra

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: cassandra
spec:
  serviceName: cassandra-srv
  replicas: 2
  selector:
    matchLabels:
      app: cassandra
  template:
    metadata:
      labels:
        app: cassandra
    spec:
      containers:
        - name: cassandra
          image: cassandra:latest
          ports:
            - containerPort: 9042 
          env:
            - name: CASSANDRA_AUTHENTICATOR
              value: PasswordAuthenticator
            - name: CASSANDRA_AUTHORIZATION
              value: CassandraAuthorizer
            - name: CASSANDRA_PASSWORD_SEEDER
              value: "true"
            - name: CASSANDRA_SUPERUSER_PASSWORD
              value: "11111"
            - name: CASSANDRA_SEEDS
              value: cassandra-0.cassandra-srv.default.svc.cluster.local
          volumeMounts:
            - name: cassandra-data
              mountPath: /var/lib/cassandra
  volumeClaimTemplates:
    - metadata:
        name: cassandra-data
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 1Gi

Then I decided to improve my laptop and replaced the RAM kit with a new 64GB RAM kit. After I did minikube delete and created a new cluster, this time with RAM capacity doubled using the following command (in fact I just replaced -memory=8192 with -memory=16384 ) :


  minikube start --driver=docker \
    --cpus=2 --memory=16384 --disk-size=16g \
    --nodes=3 --addons=registry \
    --insecure-registry="10.0.0.0/24" \
    --insecure-registry="192.168.49.0/24" 

When I execute skaffold run with all the previous manifest files and nothing changed at all, I do see "OOMKilled" and "CrashLoopBackOff" errors.

I did also tried to add resources as follows but wasn't helpful:

 resources:
    requests:
      memory: "4Gi"
      cpu: "1"
    limits:
      memory: "8Gi"
      cpu: "2"

learner00000 avatar May 06 '25 19:05 learner00000

Cassandra, by default, tries to consume almost all the available memory of the system unless configured to do otherwise. I would suggest MAX_HEAP_SIZE and HEAP_NEWSIZE as good places to start with configuring it to use more reasonable values depending on your available resources (and the amount of those available resources you expect/hope to share with other services).

As far as I know, it is also not cgroup-aware by default, so a limit there without telling Cassandra to stay under it explicitly will likely lead to OOM even more reliably:

$ docker run -it --rm --memory 1g --pull=always cassandra
latest: Pulling from library/cassandra
Digest: sha256:ee5be67d740b5a427881effcfb672b6c986122ec139eada751f82bca247d6904
Status: Image is up to date for cassandra:latest
$ echo $?
137

(Exit code 137 is typically SIGKILL which in this case is almost certainly the OOM killer.)

tianon avatar May 06 '25 21:05 tianon