chaostoolkit-kubernetes icon indicating copy to clipboard operation
chaostoolkit-kubernetes copied to clipboard

Receiving msg: ConnectionRefusedError: [Errno 111] Connection refused

Open gopimandala opened this issue 2 years ago • 1 comments

I am trying to run an experiment as a pod(Job) inside a k8s cluster in GKE. The same experiment runs fine when I run from CLI. When I run it as a pod the logs show that the hypothesis before action (i.e. termination pod) but the hypothesis after the action returns 'Connection refused' error. Sometimes the same experiment returns the same error even for the hypothesis before action. I am using service account for authentication and I am seeing that at times the application pod gets successfully terminated so there shouldn't be an issue of authentication.

Here is the config passed on to the pod:

apiVersion: v1
kind: ConfigMap
metadata:
  name: newapp-config
data:
  health-http.yaml: |
    version: 1.0.0
    title: What happens if we terminate an instance of the application?
    description: If an instance of the application is terminated, the applications as a whole should still be operational.
    tags:
    - k8s
    - pod
    steady-state-hypothesis:
      title: The app is healthy
      probes:
      - name: app-responds-to-requests
        type: probe
        tolerance: 200
        provider:
          type: http
          timeout: 10
          verify_tls: false
          url: http://newapp
          headers:
            Host: newapp.example.com
    method:
    - type: action
      name: terminate-app-pod
      provider:
        type: python
        module: chaosk8s.pod.actions
        func: terminate_pods
        arguments:
          label_selector: app=newapp
          rand: true
          ns: default
      pauses: 
        after: 2

Here is the error msg:

$ kubectl logs newapp-chaos-czmbp
[2022-02-03 09:43:03 DEBUG] [cli:70] ###############################################################################
[2022-02-03 09:43:03 DEBUG] [cli:71] Running command 'run'
[2022-02-03 09:43:03 DEBUG] [cli:75] Using settings file '/root/.chaostoolkit/settings.yaml'
[2022-02-03 09:43:04 WARNING] [check:30] 
    There is a new version (1.11.0) of the chaostoolkit available.
    You may upgrade by typing:
    
    $ pip install -U chaostoolkit
    
    Please review changes at https://github.com/chaostoolkit/chaostoolkit/blob/master/CHANGELOG.md
    
[2022-02-03 09:43:04 DEBUG] [settings:23] The Chaos Toolkit settings file could not be found at '/root/.chaostoolkit/settings.yaml'.
[2022-02-03 09:43:04 DEBUG] [__init__:355] No controls to apply on 'loader'
[2022-02-03 09:43:04 DEBUG] [__init__:355] No controls to apply on 'loader'
[2022-02-03 09:43:04 DEBUG] [caching:25] Building activity cache...
[2022-02-03 09:43:04 DEBUG] [caching:35] Cached 2 activities
[2022-02-03 09:43:04 INFO] [experiment:54] Validating the experiment's syntax
[2022-02-03 09:43:04 DEBUG] [configuration:47] Loading configuration...
[2022-02-03 09:43:04 DEBUG] [secret:74] Loading secrets...
[2022-02-03 09:43:04 DEBUG] [secret:89] Secrets loaded
[2022-02-03 09:43:21 INFO] [experiment:103] Experiment looks valid
[2022-02-03 09:43:21 DEBUG] [caching:42] Clearing activities cache
[2022-02-03 09:43:21 DEBUG] [caching:25] Building activity cache...
[2022-02-03 09:43:21 DEBUG] [caching:35] Cached 2 activities
[2022-02-03 09:43:21 INFO] [experiment:182] Running experiment: What happens if we terminate an instance of the application?
[2022-02-03 09:43:21 DEBUG] [configuration:47] Loading configuration...
[2022-02-03 09:43:21 DEBUG] [secret:74] Loading secrets...
[2022-02-03 09:43:21 DEBUG] [secret:89] Secrets loaded
[2022-02-03 09:43:21 DEBUG] [__init__:39] Initializing controls
[2022-02-03 09:43:22 DEBUG] [__init__:355] No controls to apply on 'experiment'
[2022-02-03 09:43:22 INFO] [hypothesis:184] Steady state hypothesis: The app is healthy
[2022-02-03 09:43:22 DEBUG] [__init__:355] No controls to apply on 'hypothesis'
[2022-02-03 09:43:22 DEBUG] [__init__:355] No controls to apply on 'activity'
[2022-02-03 09:43:22 INFO] [activity:160] Probe: app-responds-to-requests
[2022-02-03 09:43:22 DEBUG] [activity:179]   => succeeded with '{'status': 200, 'headers': {'Server': 'nginx/1.15.4', 'Date': 'Thu, 03 Feb 2022 09:43:22 GMT', 'Content-Type': 'text/html', 'Content-Length': '208', 'Last-Modified': 'Thu, 03 Feb 2022 07:21:47 GMT', 'Connection': 'keep-alive', 'ETag': '"61fb828b-d0"', 'Accept-Ranges': 'bytes'}, 'body': "<HTML>\n<HEAD>\n<TITLE>This page is on newapp-v2-866f8798cd-8s424 and is version v2</TITLE>\n</HEAD><BODY>\n<H1>THIS IS HOST newapp-v2-866f8798cd-8s424</H1>\n<H2>And we're running version: v2</H2>\n</BODY>\n</HTML>\n"}'
[2022-02-03 09:43:22 DEBUG] [__init__:355] No controls to apply on 'activity'
[2022-02-03 09:43:22 DEBUG] [hypothesis:212] allowed tolerance is 200
[2022-02-03 09:43:22 INFO] [hypothesis:222] Steady state hypothesis is met!
[2022-02-03 09:43:22 DEBUG] [__init__:355] No controls to apply on 'hypothesis'
[2022-02-03 09:43:22 DEBUG] [__init__:355] No controls to apply on 'method'
[2022-02-03 09:43:22 DEBUG] [__init__:355] No controls to apply on 'activity'
[2022-02-03 09:43:22 INFO] [activity:160] Action: terminate-app-pod
[2022-02-03 09:43:22 DEBUG] [python:34] Activity 'terminate-app-pod' loaded from '/usr/local/lib/python3.8/site-packages/chaosk8s/pod/actions.py'
[2022-02-03 09:43:23 DEBUG] [actions:193] Found 3 pods labelled 'app=newapp' in ns default
[2022-02-03 09:43:23 DEBUG] [activity:181]   => succeeded without any result value
[2022-02-03 09:43:23 INFO] [activity:197] Pausing after activity for 2s...
[2022-02-03 09:43:25 DEBUG] [__init__:355] No controls to apply on 'activity'
[2022-02-03 09:43:25 DEBUG] [__init__:355] No controls to apply on 'method'
[2022-02-03 09:43:25 INFO] [hypothesis:184] Steady state hypothesis: The app is healthy
[2022-02-03 09:43:25 DEBUG] [__init__:355] No controls to apply on 'hypothesis'
[2022-02-03 09:43:25 DEBUG] [__init__:355] No controls to apply on 'activity'
[2022-02-03 09:43:25 INFO] [activity:160] Probe: app-responds-to-requests
[2022-02-03 09:43:25 DEBUG] [activity:233] Activity failed
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 156, in _new_conn
        conn = connection.create_connection(
      File "/usr/local/lib/python3.8/site-packages/urllib3/util/connection.py", line 84, in create_connection
        raise err
      File "/usr/local/lib/python3.8/site-packages/urllib3/util/connection.py", line 74, in create_connection
        sock.connect(sa)
    ConnectionRefusedError: [Errno 111] Connection refused
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 665, in urlopen
        httplib_response = self._make_request(
      File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 387, in _make_request
        conn.request(method, url, **httplib_request_kw)
      File "/usr/local/lib/python3.8/http/client.py", line 1230, in request
        self._send_request(method, url, body, headers, encode_chunked)
      File "/usr/local/lib/python3.8/http/client.py", line 1276, in _send_request
        self.endheaders(body, encode_chunked=encode_chunked)
      File "/usr/local/lib/python3.8/http/client.py", line 1225, in endheaders
        self._send_output(message_body, encode_chunked=encode_chunked)
      File "/usr/local/lib/python3.8/http/client.py", line 1004, in _send_output
        self.send(msg)
      File "/usr/local/lib/python3.8/http/client.py", line 944, in send
        self.connect()
      File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 184, in connect
        conn = self._new_conn()
      File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 168, in _new_conn
        raise NewConnectionError(
    urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7facc1a97e20>: Failed to establish a new connection: [Errno 111] Connection refused

Link to the container used for the pod: https://github.com/vfarcic/chaostoolkit-container-image/blob/master/Dockerfile

Manifest for the pod/job:

apiVersion: batch/v1
kind: Job
metadata:
  name: newapp-chaos
spec:
  activeDeadlineSeconds: 600
  backoffLimit: 0
  template:
    metadata:
      labels:
        app: newapp-job
      annotations:
        sidecar.istio.io/inject: "false"
    spec:
      serviceAccountName: newapp-chaos
      restartPolicy: Never
      containers:
      - name: chaostoolkit
        image: vfarcic/chaostoolkit:1.4.1-2
        args:
        - --verbose
        - run
        - /experiment/health-http.yaml
        env:
        - name: CHAOSTOOLKIT_IN_POD
          value: "true"
        volumeMounts:
        - name: config
          mountPath: /experiment
          readOnly: true
        resources:
          limits:
            cpu: 20m
            memory: 64Mi
          requests:
            cpu: 20m
            memory: 64Mi
      volumes:
      - name: config
        configMap:
          name: newapp-config

gopimandala avatar Feb 03 '22 10:02 gopimandala

This addon only supports ember - I think other people are spiking on a glimmer-redux addon if you want to checkout the glimmer channel on slack

toranb avatar Jun 02 '17 20:06 toranb

I just published glimmer redux if you are still interested

https://github.com/glimmer-redux/glimmer-redux

toranb avatar Oct 18 '17 03:10 toranb

After struggling for over a day I tried reinstalling istio and it worked fine afterwards. What a bummer!

gopimandala avatar Feb 04 '22 05:02 gopimandala