chaostoolkit-kubernetes icon indicating copy to clipboard operation
chaostoolkit-kubernetes copied to clipboard

Count pods probe is inconsistent.

Open chaosdudu opened this issue 5 years ago • 0 comments

Hi, we have been trying the probe count pods and faced inconsistency. When we give a range as tolerance it is increasing the pod count and also the return of pod count is incorrect. Another point phase check is also inconsistent, it does not recognise the running pods and prompting as pending even if they are running state when its checked with kubectl.

[36m[2019-02-01 14:49:52 DEBUG][39m The Chaos Toolkit settings file could not be found at '/home/jakob/.chaostoolkit/settings.yaml'. [36m[2019-02-01 14:49:52 DEBUG][39m Building activity cache... [36m[2019-02-01 14:49:52 DEBUG][39m Cached 2 activities [32m[2019-02-01 14:49:52 INFO][39m Validating the experiment's syntax [36m[2019-02-01 14:49:52 DEBUG][39m Loading configuration... [36m[2019-02-01 14:49:52 DEBUG][39m Loading secrets... [36m[2019-02-01 14:49:52 DEBUG][39m Secrets loaded [32m[2019-02-01 14:49:52 INFO][39m Experiment looks valid [36m[2019-02-01 14:49:52 DEBUG][39m Clearing activities cache [36m[2019-02-01 14:49:52 DEBUG][39m Building activity cache... [36m[2019-02-01 14:49:52 DEBUG][39m Cached 2 activities [32m[2019-02-01 14:49:52 INFO][39m Running experiment: Test janie-nginx Resilience - at least one pod [36m[2019-02-01 14:49:52 DEBUG][39m Loading configuration... [36m[2019-02-01 14:49:52 DEBUG][39m Loading secrets... [36m[2019-02-01 14:49:52 DEBUG][39m Secrets loaded [36m[2019-02-01 14:49:52 DEBUG][39m Initializing controls [32m[2019-02-01 14:49:52 INFO][39m Steady state hypothesis: Prometheus running as expected [32m[2019-02-01 14:49:52 INFO][39m Probe: count_pods [36m[2019-02-01 14:49:52 DEBUG][39m Activity 'count_pods' loaded from '/usr/lib/python3.7/site-packages/chaosk8s/pod/probes.py' [36m[2019-02-01 14:49:52 DEBUG][39m Using Kubernetes context: default [36m[2019-02-01 14:49:52 DEBUG][39m Found 2 pods matching label 'app=janie-nginx' in ns 'chaos' [36m[2019-02-01 14:49:52 DEBUG][39m => succeeded with '2' [36m[2019-02-01 14:49:52 DEBUG][39m allowed tolerance is [1, 3] [32m[2019-02-01 14:49:52 INFO][39m Steady state hypothesis is met! [32m[2019-02-01 14:49:52 INFO][39m Action: terminate_pods [36m[2019-02-01 14:49:52 DEBUG][39m Activity 'terminate_pods' loaded from '/usr/lib/python3.7/site-packages/chaosk8s/pod/actions.py' [36m[2019-02-01 14:49:52 DEBUG][39m Using Kubernetes context: default [36m[2019-02-01 14:49:52 DEBUG][39m Found 2 pods labelled 'app=janie-nginx' in ns chaos [36m[2019-02-01 14:49:52 DEBUG][39m Pod 'janie-nginx-5795fbf867-l6l4b' match pattern [36m[2019-02-01 14:49:52 DEBUG][39m Pod 'janie-nginx-5795fbf867-vcv2p' match pattern [36m[2019-02-01 14:49:52 DEBUG][39m Picked pods 'janie-nginx-5795fbf867-l6l4b,janie-nginx-5795fbf867-vcv2p' to be terminated [36m[2019-02-01 14:49:52 DEBUG][39m => succeeded without any result value [32m[2019-02-01 14:49:52 INFO][39m Pausing after activity for 5s... [32m[2019-02-01 14:49:57 INFO][39m Steady state hypothesis: Prometheus running as expected [32m[2019-02-01 14:49:57 INFO][39m Probe: count_pods [36m[2019-02-01 14:49:57 DEBUG][39m Activity 'count_pods' loaded from '/usr/lib/python3.7/site-packages/chaosk8s/pod/probes.py' [36m[2019-02-01 14:49:57 DEBUG][39m Using Kubernetes context: default [36m[2019-02-01 14:49:58 DEBUG][39m Found 2 pods matching label 'app=janie-nginx' in ns 'chaos' [36m[2019-02-01 14:49:58 DEBUG][39m => succeeded with '2' [36m[2019-02-01 14:49:58 DEBUG][39m allowed tolerance is [1, 3] [32m[2019-02-01 14:49:58 INFO][39m Steady state hypothesis is met! [32m[2019-02-01 14:49:58 INFO][39m Let's rollback... [32m[2019-02-01 14:49:58 INFO][39m No declared rollbacks, let's move on. [32m[2019-02-01 14:49:58 INFO][39m Experiment ended with status: completed [36m[2019-02-01 14:49:58 DEBUG][39m Cleaning up controls [36m[2019-02-01 14:49:58 DEBUG][39m Clearing activities cache [36m[2019-02-01 14:51:00 DEBUG][39m ############################################################################### [36m[2019-02-01 14:51:00 DEBUG][39m Running command 'run' [36m[2019-02-01 14:51:00 DEBUG][39m Using settings file '/home/jakob/.chaostoolkit/settings.yaml' [33m[2019-02-01 14:51:01 WARNING][39m There is a new version (1.0.0rc3) of the chaostoolkit available. You may upgrade by typing:

$ pip install -U chaostoolkit

Please review changes at https://github.com/chaostoolkit/chaostoolkit/blob/master/CHANGELOG.md

[36m[2019-02-01 14:51:01 DEBUG][39m The Chaos Toolkit settings file could not be found at '/home/jakob/.chaostoolkit/settings.yaml'. [36m[2019-02-01 14:51:01 DEBUG][39m Building activity cache... [36m[2019-02-01 14:51:01 DEBUG][39m Cached 2 activities [32m[2019-02-01 14:51:01 INFO][39m Validating the experiment's syntax [36m[2019-02-01 14:51:01 DEBUG][39m Loading configuration... [36m[2019-02-01 14:51:01 DEBUG][39m Loading secrets... [36m[2019-02-01 14:51:01 DEBUG][39m Secrets loaded [32m[2019-02-01 14:51:01 INFO][39m Experiment looks valid [36m[2019-02-01 14:51:01 DEBUG][39m Clearing activities cache [36m[2019-02-01 14:51:01 DEBUG][39m Building activity cache... [36m[2019-02-01 14:51:01 DEBUG][39m Cached 2 activities [32m[2019-02-01 14:51:01 INFO][39m Running experiment: Test janie-nginx Resilience - at least one pod [36m[2019-02-01 14:51:01 DEBUG][39m Loading configuration... [36m[2019-02-01 14:51:01 DEBUG][39m Loading secrets... [36m[2019-02-01 14:51:01 DEBUG][39m Secrets loaded [36m[2019-02-01 14:51:01 DEBUG][39m Initializing controls [32m[2019-02-01 14:51:01 INFO][39m Steady state hypothesis: Prometheus running as expected [32m[2019-02-01 14:51:01 INFO][39m Probe: count_pods [36m[2019-02-01 14:51:01 DEBUG][39m Activity 'count_pods' loaded from '/usr/lib/python3.7/site-packages/chaosk8s/pod/probes.py' [36m[2019-02-01 14:51:01 DEBUG][39m Using Kubernetes context: default [36m[2019-02-01 14:51:03 DEBUG][39m Found 2 pods matching label 'app=janie-nginx' in ns 'chaos' [36m[2019-02-01 14:51:03 DEBUG][39m => succeeded with '2' [36m[2019-02-01 14:51:03 DEBUG][39m allowed tolerance is [1, 2] [32m[2019-02-01 14:51:03 INFO][39m Steady state hypothesis is met! [32m[2019-02-01 14:51:03 INFO][39m Action: terminate_pods [36m[2019-02-01 14:51:03 DEBUG][39m Activity 'terminate_pods' loaded from '/usr/lib/python3.7/site-packages/chaosk8s/pod/actions.py' [36m[2019-02-01 14:51:03 DEBUG][39m Using Kubernetes context: default [36m[2019-02-01 14:51:03 DEBUG][39m Found 2 pods labelled 'app=janie-nginx' in ns chaos [36m[2019-02-01 14:51:03 DEBUG][39m Pod 'janie-nginx-5795fbf867-7wjdt' match pattern [36m[2019-02-01 14:51:03 DEBUG][39m Pod 'janie-nginx-5795fbf867-zkbvf' match pattern [36m[2019-02-01 14:51:03 DEBUG][39m Picked pods 'janie-nginx-5795fbf867-7wjdt,janie-nginx-5795fbf867-zkbvf' to be terminated [36m[2019-02-01 14:51:03 DEBUG][39m => succeeded without any result value [32m[2019-02-01 14:51:03 INFO][39m Pausing after activity for 10s... [32m[2019-02-01 14:51:13 INFO][39m Steady state hypothesis: Prometheus running as expected [32m[2019-02-01 14:51:13 INFO][39m Probe: count_pods [36m[2019-02-01 14:51:13 DEBUG][39m Activity 'count_pods' loaded from '/usr/lib/python3.7/site-packages/chaosk8s/pod/probes.py' [36m[2019-02-01 14:51:13 DEBUG][39m Using Kubernetes context: default [36m[2019-02-01 14:51:14 DEBUG][39m Found 4 pods matching label 'app=janie-nginx' in ns 'chaos' [36m[2019-02-01 14:51:14 DEBUG][39m => succeeded with '4' [36m[2019-02-01 14:51:14 DEBUG][39m allowed tolerance is [1, 2] [2019-02-01 14:51:14 CRITICAL] Steady state probe 'count_pods' is not in the given tolerance so failing this experiment [32m[2019-02-01 14:51:14 INFO][39m Let's rollback... [32m[2019-02-01 14:51:14 INFO][39m No declared rollbacks, let's move on. [32m[2019-02-01 14:51:14 INFO][39m Experiment ended with status: deviated [32m[2019-02-01 14:51:14 INFO][39m The steady-state has deviated, a weakness may have been discovered [36m[2019-02-01 14:51:14 DEBUG][39m Cleaning up controls [36m[2019-02-01 14:51:14 DEBUG][39m Clearing activities cache [36m[2019-02-01 14:52:21 DEBUG][39m ############################################################################### [36m[2019-02-01 14:52:21 DEBUG][39m Running command 'run' [36m[2019-02-01 14:52:21 DEBUG][39m Using settings file '/home/jakob/.chaostoolkit/settings.yaml' [33m[2019-02-01 14:52:22 WARNING][39m There is a new version (1.0.0rc3) of the chaostoolkit available. You may upgrade by typing:

---------------____________________________------------------------------_____________________________ [36m[2019-02-01 14:31:47 DEBUG][39m Activity 'pods_in_phase' loaded from '/usr/lib/python3.7/site-packages/chaosk8s/pod/probes.py' [36m[2019-02-01 14:31:47 DEBUG][39m Using Kubernetes context: default [36m[2019-02-01 14:31:47 DEBUG][39m Found 4 pods matching label 'app=janie-nginx' in ns 'chaos' [36m[2019-02-01 14:31:47 DEBUG][39m Activity failed Traceback (most recent call last): File "/usr/lib/python3.7/site-packages/chaoslib/provider/python.py", line 57, in run_python_activity return func(**arguments) File "/usr/lib/python3.7/site-packages/chaosk8s/pod/probes.py", line 105, in pods_in_phase name=label_selector, s=d.status.phase, p=phase)) chaoslib.exceptions.ActivityFailed: pod 'app=janie-nginx' is in phase 'Pending' but should be 'Running'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/chaoslib/activity.py", line 224, in run_activity
    result = run_python_activity(activity, configuration, secrets)
  File "/usr/lib/python3.7/site-packages/chaoslib/provider/python.py", line 62, in run_python_activity
    sys.exc_info()[2])
  File "/usr/lib/python3.7/site-packages/chaoslib/provider/python.py", line 57, in run_python_activity
    return func(**arguments)
  File "/usr/lib/python3.7/site-packages/chaosk8s/pod/probes.py", line 105, in pods_in_phase
    name=label_selector, s=d.status.phase, p=phase))
chaoslib.exceptions.ActivityFailed: chaoslib.exceptions.ActivityFailed: pod 'app=janie-nginx' is in phase 'Pending' but should be 'Running'

[31m[2019-02-01 14:31:47 ERROR][39m => failed: chaoslib.exceptions.ActivityFailed: pod 'app=janie-nginx' is in phase 'Pending' but should be 'Running' [33m[2019-02-01 14:31:47 WARNING][39m Probe terminated unexpectedly, so its tolerance could not be validated [2019-02-01 14:31:47 CRITICAL] Steady state probe 'pods_in_phase' is not in the given tolerance so failing this experiment [32m[2019-02-01 14:31:47 INFO][39m Let's rollback... [32m[2019-02-01 14:31:47 INFO][39m No declared rollbacks, let's move on. [32m[2019-02-01 14:31:47 INFO][39m Experiment ended with status: deviated [32m[2019-02-01 14:31:47 INFO][39m The steady-state has deviated, a weakness may have been discovered [36m[2019-02-01 14:31:47 DEBUG][39m Cleaning up controls [36m[2019-02-01 14:31:47 DEBUG][39m Clearing activities cache

chaosdudu avatar Feb 14 '19 10:02 chaosdudu