lifecycle-toolkit icon indicating copy to clipboard operation
lifecycle-toolkit copied to clipboard

Scheduler OOMKilled in Openshift

Open wiika opened this issue 1 year ago • 2 comments

Hi there.

We have installed Keptn on Openshift ARO 4.13.40 using the Helm chart. The scheduler pod is killed because the pod gets an Out of memory error (OOMKilled and CrashLoopBackoff in Openshift). We tried setting the scheduler.resources.requests.memory and scheduler.resources.limits.memory in helm parameters but they did not get set in the pod. Current status in the pod:

 containers:
    - resources:
        limits:
          cpu: 300m
          memory: 100Mi
        requests:
          cpu: 100m
          memory: 20Mi
      name: scheduler

Our ArgoCD config:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: keptn
  finalizers:
    - resources-finalizer.argocd.argoproj.io # enabling cascading deletion
spec:
  destination:
    namespace: keptn-system
    server: 'https://kubernetes.default.svc'
  source:
    repoURL: 'https://charts.lifecycle.keptn.sh'
    targetRevision: 0.8.0
    chart: keptn
    helm:
      parameters:
        - name: "commitID"
          value: "$ARGOCD_APP_REVISION"
        - name: global.openShift.enabled
          value: 'true'
        - name: scheduler.resources.requests.memory
          value: '100Mi'
        - name: scheduler.resources.limits.memory
          value: '200Mi'
  sources: []
  project: default
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

wiika avatar Sep 20 '24 11:09 wiika

Hi @wiika ! The scheduler is actually part of the lifecycle operator subchart, so for the values to be set correctly please try prefixing them with lifecycleOperator.

mowies avatar Sep 23 '24 06:09 mowies

Your openshift version is based on k8s 1.26, but from k8s 1.27 the scheduler is disabled by default anyways. But you could already try and disable it in favor of schedulinggates on a dev environment to see if that works for you as well. You can check out some docs about it here.

mowies avatar Sep 23 '24 08:09 mowies

@mowies ,

thanks, the lifecycleOperator prefix worked for propagating the parameters. Will check out scheduling gates.

wiika avatar Sep 26 '24 08:09 wiika