cass-operator icon indicating copy to clipboard operation
cass-operator copied to clipboard

K8SSAND-954 ⁃ Unable to create CassandraDatacenter if Setup containers.securityContext.readOnlyRootFilesystem: true

Open zhimsun opened this issue 4 years ago • 5 comments

What happened? I tried to create a CassandraDatacenter with the containers.securityContext.readOnlyRootFilesystem: true, but the pod is always in the CrashLoopBackOff status.

The pods are running normally if I change the containers.securityContext.readOnlyRootFilesystem: false

The yaml

# Sized to work on 3 k8s workers nodes with 1 core / 4 GB RAM
# See neighboring example-cassdc-full.yaml for docs for each parameter
apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
  name: dc21
spec:
  nodeAffinityLabels:
    beta.kubernetes.io/arch: amd64
  clusterName: cluster2
  serverType: dse
  serverVersion: "6.8.14"
  systemLoggerImage: 
  serverImage: 
  configBuilderImage: 
  managementApiAuth:
    insecure: {}
  size: 1
  resources:
    requests:
      cpu: 1
      memory: 4Gi
    limits:
      cpu: 1
      memory: 4Gi
  storageConfig:
    cassandraDataVolumeClaimSpec:
      storageClassName: nfs-client
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 5Gi
  dockerImageRunsAsCassandra: false
  podTemplateSpec:
    spec:
      initContainers:
      - name: server-config-init
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          privileged: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
      containers:
      - name: "cassandra"
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          privileged: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
      hostIPC: false
      hostNetwork: false
      hostPID: false
      securityContext:
        runAsNonRoot: true
  config:
    jvm-server-options:
      initial_heap_size: "800M"
      max_heap_size: "800M"
      additional-jvm-opts:
        # As the database comes up for the first time, set system keyspaces to RF=3
        - "-Ddse.system_distributed_replication_dc_names=dc21"
        - "-Ddse.system_distributed_replication_per_dc=3"

The pod status

MacBook-Pro-3:db zhiminsun$ oc get pod 
NAME                                                 READY   STATUS             RESTARTS   AGE
cluster2-dc21-default-sts-0                          1/2     CrashLoopBackOff   213        17h

The pod Events error

Events:
  Warning  BackOff         62s (x7 over 103s)  kubelet, worker2.zhim.cp.fyre.ibm.com  Back-off restarting failed container

Did you expect to see something different? I expect that containers.securityContext.readOnlyRootFilesystem: true

┆Issue is synchronized with this Jira Task by Unito ┆Reviewer: Michael Burman ┆friendlyId: K8SSAND-954 ┆priority: Medium

zhimsun avatar Oct 08 '21 15:10 zhimsun

Hi @zhimsun

What version of cass-operator are you using?

The pods are running normally if I change the containers.securityContext.readOnlyRootFilesystem: false

Did you make this change for all containers? If not, which one(s)?

I am trying to test and produce with CodeReady Containers, but cass-operator is crashing. Looks like it is happening during initialization. I'll try some more.

jsanda avatar Oct 08 '21 17:10 jsanda

I tested against my local kind cluster with a slightly modified manifest. Here is mine:

# Sized to work on 3 k8s workers nodes with 1 core / 4 GB RAM
# See neighboring example-cassdc-full.yaml for docs for each parameter
apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
  name: dc21
spec:
#  nodeAffinityLabels:
#    beta.kubernetes.io/arch: amd64
  clusterName: cluster2
  serverType: dse
  serverVersion: "6.8.14"
  systemLoggerImage:
  serverImage:
  configBuilderImage:
  managementApiAuth:
    insecure: {}
  size: 1
#  resources:
#    requests:
#      cpu: 1
#      memory: 4Gi
#    limits:
#      cpu: 1
#      memory: 4Gi
  storageConfig:
    cassandraDataVolumeClaimSpec:
      storageClassName: standard
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 5Gi
  dockerImageRunsAsCassandra: false
  podTemplateSpec:
    spec:
      initContainers:
        - name: server-config-init
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - ALL
            privileged: false
            readOnlyRootFilesystem: true
            runAsNonRoot: true
      containers:
        - name: "cassandra"
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - ALL
            privileged: false
            readOnlyRootFilesystem: true
            runAsNonRoot: true
      hostIPC: false
      hostNetwork: false
      hostPID: false
      securityContext:
        runAsNonRoot: true
        runAsUser: 65533
        runAsGroup: 65533
        fsGroup: 65533
  config:
    jvm-server-options:
      initial_heap_size: "800M"
      max_heap_size: "800M"
      additional-jvm-opts:
        # As the database comes up for the first time, set system keyspaces to RF=3
        - "-Ddse.system_distributed_replication_dc_names=dc21"
        - "-Ddse.system_distributed_replication_per_dc=3"

I had to update securityContext. Without setting the user and group the pod was failing to initialize with this error:

    state:
      waiting:
        message: 'container has runAsNonRoot and image has non-numeric user (cassandra),
          cannot verify user is non-root (pod: "cluster2-dc21-default-sts-0_cass-operator(7a5fc807-2b54-4751-9c08-497470fa0ef1)",
          container: server-config-init)'
        reason: CreateContainerConfigError

I deleted my CassandraDatacenter and changed the securityContext and now I do end up with a CrashLoopBackOff due to the cassandra container. Here is the error in the logs:

ln: failed to create symbolic link '/opt/dse/resources/spark/conf/hive-site.xml': Read-only file system

I need to pull someone in whose is more familiar with DSE for some help.

cc @bradfordcp

jsanda avatar Oct 08 '21 17:10 jsanda

@jsanda my cass-operator version is v1.7.1, I only have one container, cassandra

For the initContainers, I can setup readOnlyRootFilesystem: true

  podTemplateSpec:
    spec:
      initContainers:
      - name: server-config-init
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          privileged: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true

but for the containers I cannot setup readOnlyRootFilesystem: true

      containers:
      - name: "cassandra"
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          privileged: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true

zhimsun avatar Oct 08 '21 18:10 zhimsun

@zhimsun can you share the logs from the cassandra container?

jsanda avatar Oct 08 '21 18:10 jsanda

@jsanda The cassandra container did not create

oc exec -it cluster2-dc21-default-sts-0 -n zen bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulting container name to cassandra.
Use 'oc describe pod/cluster2-dc21-default-sts-0 -n zen' to see all of the containers in this pod.
error: unable to upgrade connection: container not found ("cassandra")

You can reproduce on your cluster input the systemLoggerImage, serverImage, configBuilderImage values

# Sized to work on 3 k8s workers nodes with 1 core / 4 GB RAM
# See neighboring example-cassdc-full.yaml for docs for each parameter
apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
  name: dc21
spec:
  nodeAffinityLabels:
    beta.kubernetes.io/arch: amd64
  clusterName: cluster2
  serverType: dse
  serverVersion: "6.8.14"
  systemLoggerImage: <image>
  serverImage:  <image>
  configBuilderImage:  <image>
  managementApiAuth:
    insecure: {}
  size: 1
  resources:
    requests:
      cpu: 1
      memory: 4Gi
    limits:
      cpu: 1
      memory: 4Gi
  storageConfig:
    cassandraDataVolumeClaimSpec:
      storageClassName: nfs-client
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 5Gi
  dockerImageRunsAsCassandra: false
  podTemplateSpec:
    spec:
      initContainers:
      - name: server-config-init
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          privileged: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
      containers:
      - name: "cassandra"
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          privileged: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
      hostIPC: false
      hostNetwork: false
      hostPID: false
      securityContext:
        runAsNonRoot: true
  config:
    jvm-server-options:
      initial_heap_size: "800M"
      max_heap_size: "800M"
      additional-jvm-opts:
        # As the database comes up for the first time, set system keyspaces to RF=3
        - "-Ddse.system_distributed_replication_dc_names=dc21"
        - "-Ddse.system_distributed_replication_per_dc=3"

zhimsun avatar Oct 08 '21 18:10 zhimsun