chaostoolkit-kubernetes
chaostoolkit-kubernetes copied to clipboard
Receiving msg: ConnectionRefusedError: [Errno 111] Connection refused
I am trying to run an experiment as a pod(Job) inside a k8s cluster in GKE. The same experiment runs fine when I run from CLI. When I run it as a pod the logs show that the hypothesis before action (i.e. termination pod) but the hypothesis after the action returns 'Connection refused' error. Sometimes the same experiment returns the same error even for the hypothesis before action. I am using service account for authentication and I am seeing that at times the application pod gets successfully terminated so there shouldn't be an issue of authentication.
Here is the config passed on to the pod:
apiVersion: v1
kind: ConfigMap
metadata:
name: newapp-config
data:
health-http.yaml: |
version: 1.0.0
title: What happens if we terminate an instance of the application?
description: If an instance of the application is terminated, the applications as a whole should still be operational.
tags:
- k8s
- pod
steady-state-hypothesis:
title: The app is healthy
probes:
- name: app-responds-to-requests
type: probe
tolerance: 200
provider:
type: http
timeout: 10
verify_tls: false
url: http://newapp
headers:
Host: newapp.example.com
method:
- type: action
name: terminate-app-pod
provider:
type: python
module: chaosk8s.pod.actions
func: terminate_pods
arguments:
label_selector: app=newapp
rand: true
ns: default
pauses:
after: 2
Here is the error msg:
$ kubectl logs newapp-chaos-czmbp
[2022-02-03 09:43:03 DEBUG] [cli:70] ###############################################################################
[2022-02-03 09:43:03 DEBUG] [cli:71] Running command 'run'
[2022-02-03 09:43:03 DEBUG] [cli:75] Using settings file '/root/.chaostoolkit/settings.yaml'
[2022-02-03 09:43:04 WARNING] [check:30]
There is a new version (1.11.0) of the chaostoolkit available.
You may upgrade by typing:
$ pip install -U chaostoolkit
Please review changes at https://github.com/chaostoolkit/chaostoolkit/blob/master/CHANGELOG.md
[2022-02-03 09:43:04 DEBUG] [settings:23] The Chaos Toolkit settings file could not be found at '/root/.chaostoolkit/settings.yaml'.
[2022-02-03 09:43:04 DEBUG] [__init__:355] No controls to apply on 'loader'
[2022-02-03 09:43:04 DEBUG] [__init__:355] No controls to apply on 'loader'
[2022-02-03 09:43:04 DEBUG] [caching:25] Building activity cache...
[2022-02-03 09:43:04 DEBUG] [caching:35] Cached 2 activities
[2022-02-03 09:43:04 INFO] [experiment:54] Validating the experiment's syntax
[2022-02-03 09:43:04 DEBUG] [configuration:47] Loading configuration...
[2022-02-03 09:43:04 DEBUG] [secret:74] Loading secrets...
[2022-02-03 09:43:04 DEBUG] [secret:89] Secrets loaded
[2022-02-03 09:43:21 INFO] [experiment:103] Experiment looks valid
[2022-02-03 09:43:21 DEBUG] [caching:42] Clearing activities cache
[2022-02-03 09:43:21 DEBUG] [caching:25] Building activity cache...
[2022-02-03 09:43:21 DEBUG] [caching:35] Cached 2 activities
[2022-02-03 09:43:21 INFO] [experiment:182] Running experiment: What happens if we terminate an instance of the application?
[2022-02-03 09:43:21 DEBUG] [configuration:47] Loading configuration...
[2022-02-03 09:43:21 DEBUG] [secret:74] Loading secrets...
[2022-02-03 09:43:21 DEBUG] [secret:89] Secrets loaded
[2022-02-03 09:43:21 DEBUG] [__init__:39] Initializing controls
[2022-02-03 09:43:22 DEBUG] [__init__:355] No controls to apply on 'experiment'
[2022-02-03 09:43:22 INFO] [hypothesis:184] Steady state hypothesis: The app is healthy
[2022-02-03 09:43:22 DEBUG] [__init__:355] No controls to apply on 'hypothesis'
[2022-02-03 09:43:22 DEBUG] [__init__:355] No controls to apply on 'activity'
[2022-02-03 09:43:22 INFO] [activity:160] Probe: app-responds-to-requests
[2022-02-03 09:43:22 DEBUG] [activity:179] => succeeded with '{'status': 200, 'headers': {'Server': 'nginx/1.15.4', 'Date': 'Thu, 03 Feb 2022 09:43:22 GMT', 'Content-Type': 'text/html', 'Content-Length': '208', 'Last-Modified': 'Thu, 03 Feb 2022 07:21:47 GMT', 'Connection': 'keep-alive', 'ETag': '"61fb828b-d0"', 'Accept-Ranges': 'bytes'}, 'body': "<HTML>\n<HEAD>\n<TITLE>This page is on newapp-v2-866f8798cd-8s424 and is version v2</TITLE>\n</HEAD><BODY>\n<H1>THIS IS HOST newapp-v2-866f8798cd-8s424</H1>\n<H2>And we're running version: v2</H2>\n</BODY>\n</HTML>\n"}'
[2022-02-03 09:43:22 DEBUG] [__init__:355] No controls to apply on 'activity'
[2022-02-03 09:43:22 DEBUG] [hypothesis:212] allowed tolerance is 200
[2022-02-03 09:43:22 INFO] [hypothesis:222] Steady state hypothesis is met!
[2022-02-03 09:43:22 DEBUG] [__init__:355] No controls to apply on 'hypothesis'
[2022-02-03 09:43:22 DEBUG] [__init__:355] No controls to apply on 'method'
[2022-02-03 09:43:22 DEBUG] [__init__:355] No controls to apply on 'activity'
[2022-02-03 09:43:22 INFO] [activity:160] Action: terminate-app-pod
[2022-02-03 09:43:22 DEBUG] [python:34] Activity 'terminate-app-pod' loaded from '/usr/local/lib/python3.8/site-packages/chaosk8s/pod/actions.py'
[2022-02-03 09:43:23 DEBUG] [actions:193] Found 3 pods labelled 'app=newapp' in ns default
[2022-02-03 09:43:23 DEBUG] [activity:181] => succeeded without any result value
[2022-02-03 09:43:23 INFO] [activity:197] Pausing after activity for 2s...
[2022-02-03 09:43:25 DEBUG] [__init__:355] No controls to apply on 'activity'
[2022-02-03 09:43:25 DEBUG] [__init__:355] No controls to apply on 'method'
[2022-02-03 09:43:25 INFO] [hypothesis:184] Steady state hypothesis: The app is healthy
[2022-02-03 09:43:25 DEBUG] [__init__:355] No controls to apply on 'hypothesis'
[2022-02-03 09:43:25 DEBUG] [__init__:355] No controls to apply on 'activity'
[2022-02-03 09:43:25 INFO] [activity:160] Probe: app-responds-to-requests
[2022-02-03 09:43:25 DEBUG] [activity:233] Activity failed
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 156, in _new_conn
conn = connection.create_connection(
File "/usr/local/lib/python3.8/site-packages/urllib3/util/connection.py", line 84, in create_connection
raise err
File "/usr/local/lib/python3.8/site-packages/urllib3/util/connection.py", line 74, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 665, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 387, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/local/lib/python3.8/http/client.py", line 1230, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/lib/python3.8/http/client.py", line 1276, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.8/http/client.py", line 1225, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.8/http/client.py", line 1004, in _send_output
self.send(msg)
File "/usr/local/lib/python3.8/http/client.py", line 944, in send
self.connect()
File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 184, in connect
conn = self._new_conn()
File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 168, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7facc1a97e20>: Failed to establish a new connection: [Errno 111] Connection refused
Link to the container used for the pod: https://github.com/vfarcic/chaostoolkit-container-image/blob/master/Dockerfile
Manifest for the pod/job:
apiVersion: batch/v1
kind: Job
metadata:
name: newapp-chaos
spec:
activeDeadlineSeconds: 600
backoffLimit: 0
template:
metadata:
labels:
app: newapp-job
annotations:
sidecar.istio.io/inject: "false"
spec:
serviceAccountName: newapp-chaos
restartPolicy: Never
containers:
- name: chaostoolkit
image: vfarcic/chaostoolkit:1.4.1-2
args:
- --verbose
- run
- /experiment/health-http.yaml
env:
- name: CHAOSTOOLKIT_IN_POD
value: "true"
volumeMounts:
- name: config
mountPath: /experiment
readOnly: true
resources:
limits:
cpu: 20m
memory: 64Mi
requests:
cpu: 20m
memory: 64Mi
volumes:
- name: config
configMap:
name: newapp-config
This addon only supports ember - I think other people are spiking on a glimmer-redux addon if you want to checkout the glimmer channel on slack
I just published glimmer redux if you are still interested
https://github.com/glimmer-redux/glimmer-redux
After struggling for over a day I tried reinstalling istio and it worked fine afterwards. What a bummer!