litmus icon indicating copy to clipboard operation
litmus copied to clipboard

Unable to initialize probes

Open robertb724 opened this issue 2 years ago • 1 comments

What happened: When applying the example prometheus probe, litmus fails to initialize the probes.

What you expected to happen: I expect the probes to be initialized and for litmus to probe prometheus

Where can this issue be corrected? (optional)

How to reproduce it (as minimally and precisely as possible):

kubectl apply -f example.yaml

# contains the prom probe which execute the query and match for the expected criteria
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  appinfo:
    appns: "default"
    applabel: "app.kubernetes.io/instance=dummy-dev"
    appkind: "deployment"
  chaosServiceAccount: litmus-runner
  experiments:
  - name: pod-delete
    spec:
      probe:
      - name: "check-probe-success"
        type: "promProbe"
        promProbe/inputs:
          # endpoint for the promethus service
          endpoint: "prometheus-kube-prometheus-prometheus.observability.svc.cluster.local:9090"
          # promql query, which should be executed
          query: "vector(1)"
          comparator:
            # criteria which should be followed by the actual output and the expected output
            #supports >=,<=,>,<,==,!= comparision
            criteria: "==" 
            # expected value, which should follow the specified criteria
            value: "1"
        mode: "Edge"
        runProperties:
          probeTimeout: 5
          interval: 5
          retry: 1

Check logs of pod running the expirement:

➜  litmus k logs -f pod-delete-3xno17-xcj47 -n litmus
W0411 20:53:35.940676       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2022-04-11T20:53:35Z" level=info msg="Experiment Name: pod-delete"
time="2022-04-11T20:53:35Z" level=info msg="[PreReq]: Getting the ENV for the  experiment"
time="2022-04-11T20:53:35Z" level=error msg="Unable to initialize the probes, err: unable to Get the chaosengine, err: v1alpha1.ChaosEngine.Spec: v1alpha1.ChaosEngineSpec.Experiments: []v1alpha1.ExperimentList: v1alpha1.ExperimentList.Spec: v1alpha1.ExperimentAttributes.Probe: []v1alpha1.ProbeAttributes: v1alpha1.ProbeAttributes.CmdProbeInputs: v1alpha1.CmdProbeInputs.Source: ReadString: expects \" or n, but found {, error found in #10 byte of ...|\"source\":{}},\"httpPr|..., bigger context ...|e\":[{\"cmdProbe/inputs\":{\"comparator\":{},\"source\":{}},\"httpProbe/inputs\":{\"method\":{\"get\":{},\"post\":{|..."

Anything else we need to know?:

When running kubectl get chaosengine engine-nginx -n litmus -o yaml it appears that empty versions of other probe types were added to the resource. The error message seems to indicate that the issue is with the cmdProbe which we did not mention in our manifest.

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
  namespace: litmus
spec:
  annotationCheck: "false"
  appinfo:
    appkind: deployment
    applabel: app.kubernetes.io/instance=dummy-dev
    appns: defauult
  chaosServiceAccount: litmus-runner
  components:
    runner:
      image: litmuschaos/chaos-runner:2.6.0
      resources: {}
  engineState: stop
  experiments:
  - name: pod-delete
    spec:
      components:
        resources: {}
        statusCheckTimeouts: {}
      probe:
      - cmdProbe/inputs:
          comparator: {}
          source: {}
        httpProbe/inputs:
          method:
            get: {}
            post: {}
        k8sProbe/inputs: {}
        mode: Edge
        name: check-probe-success
        promProbe/inputs:
          comparator:
            criteria: ==
            value: "1"
          endpoint: prometheus-kube-prometheus-prometheus.observability.svc.cluster.local:9090
          query: vector(1)
        runProperties:
          interval: 5
          probeTimeout: 5
          retry: 1
        type: promProbe
status:
  engineStatus: completed
  experiments:
  - experimentPod: pod-delete-u545hj-jtx6s
    lastUpdateTime: "2022-04-11T21:12:48Z"
    name: pod-delete
    runner: engine-nginx-runner
    status: Completed
    verdict: Pass

robertb724 avatar Apr 11 '22 21:04 robertb724

Tagging @AmitKumarDas

Jonsy13 avatar Aug 08 '22 11:08 Jonsy13

Hi, @robertb724, this issue seems like due to a version mismatch between litmus-go and chaos-operator. Can you try running this with the same (compatible) version for both and see if the issue persists?

avaakash avatar Oct 13 '22 13:10 avaakash

Closing due to inactivity, feel free to re-open this issue if the problem persists.

avaakash avatar Nov 14 '22 07:11 avaakash