litmus
litmus copied to clipboard
CPU Hog Memory Hog and IO Stress is not working under high load
What happened: While we running the CPU hog and memory hog, and IO stress experiment under the load it's getting failed due to time out. Please find the log and workflow manifest attached. Without load rerun, the same manifest will work fine The platform is:- Open shift What you expected to happen: CPU hog and memory hog, and IO stress experiment should inject chaos in the target pods under high load
Where can this issue be corrected? (optional)
the stress-ng process is not injected into the target pod .when there was a high load applied to the pod
How to reproduce it (as minimally and precisely as possible): Apply a high load to the target pod [( we used Neoload) and our target pods are in Open shift ] then start inducing the memory or CPU hog. after chaos duration the experiment will fail and the error shows like timeout
Anything else we need to know?:
As per @bipinkm suggestion adding some other logs that captured during the "POD IO Stress" experiment. If we use TOTAL_CHAOS_DURATION= 60 and FILESYSTEM_UTILIZATION_PERCENTAGE=20, the experiment ran successfully, please find below helper pod log success_logs.txt
If we use FILESYSTEM_UTILIZATION_PERCENTAGE more than 20 , the experiment is failing , please find below helper pod logs failed_logs_pod_io_stress.txt .
Hi Team, Any update on the above issue.
cc: @oumkale @uditgaurav
Hi Team, We are really waiting for your update. thanks.
Hi @bipinkm we have added support for graceful termination of the stress chaos experiment in 2.4.0 this should remove the stress process gracefully if a timeout occurs.
Hi @uditgaurav and Team. regarding Pod IO stress, Pod CPU hog and Memory Hog experiments ae not running in litmus 2.4.0. Below are Env details: OpenShift Master: v3.11.439 Kubernetes Master: v1.11.0+d4cacc0 litmus version: 2.0.0
above mentioned three experiments the created helper pod is failing with error msg="helper pod failed, err: process exited before the actual cleanup, err: exit status 127"
attaching complete helper pod logs and few screenshots, please help us to run these experiments. pod-cpu-hog-helper-sdfiwb.log
Hi Team,
I am also getting same kind of error.
"helper pod failed, err: process exited before the actual cleanup, err: exit status 1"
Any solution or workaround?
Hi Team,
this error still exists in Litmus 2.14. Any updates on it or we should completely migrate to 3.x ?
I've tested it using Litmus 3.0.0 - there is the same problem (at least for my application). When application is not under load we can spawn CPU hog and it finishes with success but when application is in some load (25% cpu on the pod) it fails.
Do we have any updates on this? Im facing similar problem with 2.14.0. I would like to have a way to proper debug and help on the solution.