k8ssandra-operator
k8ssandra-operator copied to clipboard
Cassandra 4.1 process does not start with ZGC enabled
What happened?
I just got started with using K8ssandra operator and cannot wait to migrate to it our on-premise cluster. Having previously ran that cluster (version 3.11) with Shenandoah GC and saw the latency improvements, enabling ZGC was among the first things I tried. However after checking out 4.0-jdk11-G1 Cassandra pods never fully initialised, due to Cassandra process immediately exiting when started.
Did you expect to see something different?
I would expect the cluster to come up normally using the test fixture.
How to reproduce it (as minimally and precisely as possible):
- Install k8ssandra operator using Helm
-
kubectl apply -f manifest.yaml
- Readiness probe will always return 500
Environment
-
K8ssandra Operator version:
1.17.0
-
Kubernetes version information:
Server Version: version.Info{Major:"1", Minor:"29", GitVersion:"v1.29.5", GitCommit:"59755ff595fa4526236b0cc03aa2242d941a5171", GitTreeState:"clean", BuildDate:"2024-05-14T10:39:39Z", GoVersion:"go1.21.9", Compiler:"gc", Platform:"linux/amd64"}
-
Kubernetes cluster kind:
Kubespray on baremetal
-
Manifests:
apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
name: prod
namespace: k8ssandra-operator
spec:
cassandra:
serverVersion: "4.1.5"
datacenters:
- metadata:
name: fsn1
size: 3
resources:
requests:
cpu: 24
memory: 64Gi
hugepages-2Mi: 5Gi
limits:
hugepages-2Mi: 5Gi
storageConfig:
cassandraDataVolumeClaimSpec:
storageClassName: topolvm-cassandra
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 300Gi
config:
jvmOptions:
heap_initial_size: 4G
heap_max_size: 4G
gc: ZGC
additionalOptions: {}
# - -XX:ConcGCThreads=1
# - -XX:ParallelGCThreads=2 # must be >= ConcGCThreads
networking:
hostNetwork: false
- K8ssandra Operator Logs:
not relevant
Anything else we need to know?:
My debugging process involved running exec on one pod and trying to manually start the cassandra process like this:
export JAVA_VERSION=11
source /opt/cassandra/conf/cassandra-env.sh
/opt/cassandra/bin/cassandra
results in the following output
Error: VM option 'UseZGC' is experimental and must be enabled via -XX:+UnlockExperimentalVMOptions.
Error: The unlock option must precede 'UseZGC'.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
Checking the contents of /opt/cassandra/conf/jvm11-server.options
:
-Djdk.attach.allowAttachSelf=true
--add-exports java.base/jdk.internal.misc=ALL-UNNAMED
--add-exports java.base/jdk.internal.ref=ALL-UNNAMED
--add-exports java.base/sun.nio.ch=ALL-UNNAMED
--add-exports java.management.rmi/com.sun.jmx.remote.internal.rmi=ALL-UNNAMED
--add-exports java.rmi/sun.rmi.registry=ALL-UNNAMED
--add-exports java.rmi/sun.rmi.server=ALL-UNNAMED
--add-exports java.sql/java.sql=ALL-UNNAMED
--add-opens java.base/java.lang.module=ALL-UNNAMED
--add-opens java.base/jdk.internal.loader=ALL-UNNAMED
--add-opens java.base/jdk.internal.ref=ALL-UNNAMED
--add-opens java.base/jdk.internal.reflect=ALL-UNNAMED
--add-opens java.base/jdk.internal.math=ALL-UNNAMED
--add-opens java.base/jdk.internal.module=ALL-UNNAMED
--add-opens java.base/jdk.internal.util.jar=ALL-UNNAMED
--add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED
-Dio.netty.tryReflectionSetAccessible=true
-XX:+UseZGC
-XX:+UnlockExperimentalVMOptions
which indeed shows the -XX:+UseZGC
flag before -XX:+UnlockExperimentalVMOptions
.
My workaround was setting -XX:+UnlockExperimentalVMOptions
in JVM_OPTIONS
like this:
export JVM_OPTS="$JVM_OPTS -XX:+UnlockExperimentalVMOptions"
/opt/cassandra/bin/cassandra
# cassandra starts normally
Finally I would also like to mention that the use of ZGC should be backed by enabling hugepages on the nodes which was my first guess as to why the java process refused to start.