k8ssandra-operator icon indicating copy to clipboard operation
k8ssandra-operator copied to clipboard

Cassandra 4.1 process does not start with ZGC enabled

Open iAlex97 opened this issue 7 months ago • 1 comments

What happened?

I just got started with using K8ssandra operator and cannot wait to migrate to it our on-premise cluster. Having previously ran that cluster (version 3.11) with Shenandoah GC and saw the latency improvements, enabling ZGC was among the first things I tried. However after checking out 4.0-jdk11-G1 Cassandra pods never fully initialised, due to Cassandra process immediately exiting when started.

Did you expect to see something different?

I would expect the cluster to come up normally using the test fixture.

How to reproduce it (as minimally and precisely as possible):

  1. Install k8ssandra operator using Helm
  2. kubectl apply -f manifest.yaml
  3. Readiness probe will always return 500


  • K8ssandra Operator version:


  • Kubernetes version information:

    Server Version: version.Info{Major:"1", Minor:"29", GitVersion:"v1.29.5", GitCommit:"59755ff595fa4526236b0cc03aa2242d941a5171", GitTreeState:"clean", BuildDate:"2024-05-14T10:39:39Z", GoVersion:"go1.21.9", Compiler:"gc", Platform:"linux/amd64"}

  • Kubernetes cluster kind:

    Kubespray on baremetal

  • Manifests:

apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
  name: prod
  namespace: k8ssandra-operator
    serverVersion: "4.1.5"

      - metadata:
          name: fsn1

        size: 3

            cpu: 24
            memory: 64Gi
            hugepages-2Mi: 5Gi
            hugepages-2Mi: 5Gi

            storageClassName: topolvm-cassandra
              - ReadWriteOnce
                storage: 300Gi

            heap_initial_size: 4G
            heap_max_size: 4G
            gc: ZGC
            additionalOptions: {}
              # - -XX:ConcGCThreads=1
              # - -XX:ParallelGCThreads=2 # must be >= ConcGCThreads

          hostNetwork: false
  • K8ssandra Operator Logs:

not relevant

Anything else we need to know?:

My debugging process involved running exec on one pod and trying to manually start the cassandra process like this:

export JAVA_VERSION=11
source /opt/cassandra/conf/cassandra-env.sh

results in the following output

Error: VM option 'UseZGC' is experimental and must be enabled via -XX:+UnlockExperimentalVMOptions.
Error: The unlock option must precede 'UseZGC'.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

Checking the contents of /opt/cassandra/conf/jvm11-server.options:

--add-exports java.base/jdk.internal.misc=ALL-UNNAMED
--add-exports java.base/jdk.internal.ref=ALL-UNNAMED
--add-exports java.base/sun.nio.ch=ALL-UNNAMED
--add-exports java.management.rmi/com.sun.jmx.remote.internal.rmi=ALL-UNNAMED
--add-exports java.rmi/sun.rmi.registry=ALL-UNNAMED
--add-exports java.rmi/sun.rmi.server=ALL-UNNAMED
--add-exports java.sql/java.sql=ALL-UNNAMED
--add-opens java.base/java.lang.module=ALL-UNNAMED
--add-opens java.base/jdk.internal.loader=ALL-UNNAMED
--add-opens java.base/jdk.internal.ref=ALL-UNNAMED
--add-opens java.base/jdk.internal.reflect=ALL-UNNAMED
--add-opens java.base/jdk.internal.math=ALL-UNNAMED
--add-opens java.base/jdk.internal.module=ALL-UNNAMED
--add-opens java.base/jdk.internal.util.jar=ALL-UNNAMED
--add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED

which indeed shows the -XX:+UseZGC flag before -XX:+UnlockExperimentalVMOptions.

My workaround was setting -XX:+UnlockExperimentalVMOptions in JVM_OPTIONS like this:

export JVM_OPTS="$JVM_OPTS -XX:+UnlockExperimentalVMOptions"
# cassandra starts normally

Finally I would also like to mention that the use of ZGC should be backed by enabling hugepages on the nodes which was my first guess as to why the java process refused to start.

iAlex97 avatar Jul 12 '24 16:07 iAlex97