scylla-operator icon indicating copy to clipboard operation
scylla-operator copied to clipboard

Init container gets OOM killed on new cluster POD startup

Open andrey-dubnik opened this issue 1 year ago • 10 comments

What happened?

Created the local cluster using the k3s Created the operator Created the cluster, got OOM on Init container Edited the STS, increased the resource to 150Mi, cluster got created

Unfortunately the init container limits seem to be hard-coded so there is no way to influence the allocation.

What did you expect to happen?

no OOM, or at least able to change initContainer limits

How can we reproduce it (as minimally and precisely as possible)?

k3d cluster create --config $(pwd)/config.yaml

config.yaml

apiVersion: k3d.io/v1alpha4
kind: Simple
metadata:
  name: edge-composition
servers: 1
agents: 3
image: rancher/k3s:v1.22.11-k3s1
kubeAPI: # same as `--api-port myhost.my.domain:6445` (where the name would resolve to 127.0.0.1)
  host: "localhost"
  hostIP: "127.0.0.1"
  hostPort: "6447"
# expose ingress controller on local host port 8080
ports:
  - port: 9980:80 # same as `--port '8080:80@loadbalancer'`
    nodeFilters:
      - loadbalancer
  - port: 9943:443
    nodeFilters:
      - loadbalancer
options:
  k3s:
    extraArgs:
    - arg: --no-deploy=traefik # do not deploy traefik ingress, we will use a different one
      nodeFilters:
          - server:*
    nodeLabels:
      - label: topology.kubernetes.io/zone=3 # same as `--k3s-node-label 'foo=bar@agent:1'` -> this results in a Kubernetes node label
        nodeFilters:
          - agent:2
      - label: topology.kubernetes.io/zone=2 # same as `--k3s-node-label 'foo=bar@agent:1'` -> this results in a Kubernetes node label
        nodeFilters:
          - agent:0
      - label: topology.kubernetes.io/zone=1 # same as `--k3s-node-label 'foo=bar@agent:1'` -> this results in a Kubernetes node label
        nodeFilters:
          - agent:1
kubectl apply -f https://raw.githubusercontent.com/scylladb/scylla-operator/master/deploy/operator.yaml

kubectl wait --for condition=established crd/scyllaclusters.scylla.scylladb.com
kubectl wait --for condition=established crd/nodeconfigs.scylla.scylladb.com
kubectl wait --for condition=established crd/scyllaoperatorconfigs.scylla.scylladb.com
kubectl -n scylla-operator rollout status deployment.apps/scylla-operator
kubectl -n scylla-operator rollout status deployment.apps/webhook-server

Create cluster

apiVersion: scylla.scylladb.com/v1
kind: ScyllaCluster
metadata:
  name: temporal-cluster
  namespace: temporal
spec:
  version: 5.2.7
  agentVersion: 3.1.2
  repository: docker.io/scylladb/scylla
  agentRepository: docker.io/scylladb/scylla-manager-agent
  developerMode: true
  cpuset: true
  datacenter:
    name: manager-dc
    racks:
      - agentResources:
          requests:
            cpu: 50m
            memory: 80M
        members: 1
        name: zone1
        resources:
          limits:
            cpu: 1
            memory: 200Mi
          requests:
            cpu: 1
            memory: 200Mi
        storage:
          capacity: 1Gi
          # storageClassName: scylla-manager
        placement:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: topology.kubernetes.io/zone
                      operator: In
                      values:
                        - "1"
      - agentResources:
          requests:
            cpu: 50m
            memory: 80M
        members: 1
        name: zone2
        resources:
          limits:
            cpu: 1
            memory: 200Mi
          requests:
            cpu: 1
            memory: 200Mi
        storage:
          capacity: 1Gi
          # storageClassName: scylla-manager
        placement:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: topology.kubernetes.io/zone
                      operator: In
                      values:
                        - "2"
      - agentResources:
          requests:
            cpu: 50m
            memory: 80M
        members: 1
        name: zone3
        resources:
          limits:
            cpu: 1
            memory: 200Mi
          requests:
            cpu: 1
            memory: 200Mi
        storage:
          capacity: 1Gi
          # storageClassName: scylla-manager
        placement:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: topology.kubernetes.io/zone
                      operator: In
                      values:
                        - "3"

Scylla Operator version

v1.12.0-alpha.0-102-geb68db4 also reproducible on v.1.11.0

Kubernetes platform name and version

reproduced on 1.21 & on 1.25

```console $ kubectl version # paste output here WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version. Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.3", GitCommit:"9e644106593f3f4aa98f8a84b23db5fa378900bd", GitTreeState:"clean", BuildDate:"2023-03-15T13:40:17Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"darwin/amd64"} Kustomize Version: v4.5.7 Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.15+k3s1", GitCommit:"d19260dc59280c5f5a3c6596c653e7cfdbb5f3c8", GitTreeState:"clean", BuildDate:"2023-10-30T21:44:53Z", GoVersion:"go1.20.10", Compiler:"gc", Platform:"linux/amd64"} ```

Kubernetes platform info:

Please attach the must-gather archive.

scylla-operator-must-gather-hdqcl4psgfqd.zip

Anything else we need to know?

No response

andrey-dubnik avatar Nov 14 '23 12:11 andrey-dubnik