litmus icon indicating copy to clipboard operation
litmus copied to clipboard

CPU Hog Memory Hog and IO Stress is not working under high load

Open bipinkm opened this issue 3 years ago • 10 comments

What happened: While we running the CPU hog and memory hog, and IO stress experiment under the load it's getting failed due to time out. Please find the log and workflow manifest attached. Without load rerun, the same manifest will work fine The platform is:- Open shift What you expected to happen: CPU hog and memory hog, and IO stress experiment should inject chaos in the target pods under high load

Where can this issue be corrected? (optional)

the stress-ng process is not injected into the target pod .when there was a high load applied to the pod

How to reproduce it (as minimally and precisely as possible): Apply a high load to the target pod [( we used Neoload) and our target pods are in Open shift ] then start inducing the memory or CPU hog. after chaos duration the experiment will fail and the error shows like timeout

Anything else we need to know?: cpu memory-hog issue under load

cpu-hog-dec.txt memory-hog-dec.txt

bipinkm avatar Dec 16 '21 13:12 bipinkm

As per @bipinkm suggestion adding some other logs that captured during the "POD IO Stress" experiment. If we use TOTAL_CHAOS_DURATION= 60 and FILESYSTEM_UTILIZATION_PERCENTAGE=20, the experiment ran successfully, please find below helper pod log success_logs.txt

If we use FILESYSTEM_UTILIZATION_PERCENTAGE more than 20 , the experiment is failing , please find below helper pod logs failed_logs_pod_io_stress.txt .

GopiChandra25 avatar Dec 21 '21 03:12 GopiChandra25

Hi Team, Any update on the above issue.

bipinkm avatar Dec 21 '21 07:12 bipinkm

cc: @oumkale @uditgaurav

imrajdas avatar Dec 21 '21 07:12 imrajdas

Hi Team, We are really waiting for your update. thanks.

GopiChandra25 avatar Dec 28 '21 07:12 GopiChandra25

Hi @bipinkm we have added support for graceful termination of the stress chaos experiment in 2.4.0 this should remove the stress process gracefully if a timeout occurs.

uditgaurav avatar Jan 03 '22 07:01 uditgaurav

Hi @uditgaurav and Team. regarding Pod IO stress, Pod CPU hog and Memory Hog experiments ae not running in litmus 2.4.0. Below are Env details: OpenShift Master: v3.11.439 Kubernetes Master: v1.11.0+d4cacc0 litmus version: 2.0.0

above mentioned three experiments the created helper pod is failing with error msg="helper pod failed, err: process exited before the actual cleanup, err: exit status 127"

attaching complete helper pod logs and few screenshots, please help us to run these experiments. pod-cpu-hog-helper-sdfiwb.log

Cpu-hog Memory-hog

GopiChandra25 avatar Jan 24 '22 12:01 GopiChandra25

Hi Team,

I am also getting same kind of error.

"helper pod failed, err: process exited before the actual cleanup, err: exit status 1"

Any solution or workaround?

sreenusuuda avatar Feb 10 '23 10:02 sreenusuuda

Hi Team,

this error still exists in Litmus 2.14. Any updates on it or we should completely migrate to 3.x ?

ash-man avatar Nov 20 '23 09:11 ash-man

I've tested it using Litmus 3.0.0 - there is the same problem (at least for my application). When application is not under load we can spawn CPU hog and it finishes with success but when application is in some load (25% cpu on the pod) it fails.

ash-man avatar Nov 21 '23 14:11 ash-man

Do we have any updates on this? Im facing similar problem with 2.14.0. I would like to have a way to proper debug and help on the solution.

ledbruno avatar Feb 06 '24 12:02 ledbruno