litmus icon indicating copy to clipboard operation
litmus copied to clipboard

No application impact oberved while running the pod memory hog experiment with resource limit set in pod definition

Open pawanphalak opened this issue 2 years ago • 1 comments

Hi everyone, I am trying the pod memory hog experiment, and have set very high value for MEMORY_CONSUMPTION. The document mentions the value in MEMORY_CONSUMPTION, will cause the memory hog for the entire pod to be increased by this value. However in our case the memory hog goes only to a specific limit and the pods do not evict due to OOM. I wanted to set parameters such that pods are evicted due to OOM, I also noticed the TARGET_CONTAINER parameter is available, can you clarify since the MEMORY_CONSUMPTION parameter will hog the memory for entire pod, why do we need to specify the contianer name? Can you suggest any other parameter such that we can cause the pod eviction due to high memory? Will NUMBER_OF_WORKERS help? FYI: we are running 2 containers(application container and istio proxy container) in the pod and both have limit set to 1.8 Gb and 1 Gb respectively. However during chaos, I noticed the memory hogs only till 2.8 Gb and the pod do not evict due to OOM.

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosSchedule
metadata:
  name: schedule-pod-memory-hog
  namespace: family-an-1s1-perf
spec:
  engineTemplateSpec:
    annotationCheck: "false"
    components:
      runner:
        runnerAnnotations:
          sidecar.istio.io/inject: "false"
    appinfo:
      appkind: deployment
      applabel: app=productpage
      appns: family-an-1s1-perf
    chaosServiceAccount: pod-memory-hog-sa
    engineState: active
    experiments:
    - name: pod-memory-hog
      spec:
        components:
          experimentAnnotations:
            sidecar.istio.io/inject: "false"
          env:
          - name: STRESS_IMAGE
            value: alexeiled/stress-ng:latest-ubuntu
          - name: MEMORY_CONSUMPTION
            value: "30000"
          - name: TOTAL_CHAOS_DURATION
            value: "500"
    jobCleanUpPolicy: delete
  schedule:
    repeat:
      properties:
        minChaosInterval:
          minute:
            everyNthMinute: 1
  scheduleState: active

Reference to slack discussion: https://kubernetes.slack.com/archives/CNXNB0ZTN/p1661325872677929

pawanphalak avatar Sep 19 '22 11:09 pawanphalak

We faced similar issues with network related experiments. Using the target container solved the problem

ledbruno avatar Feb 06 '24 12:02 ledbruno