litmus icon indicating copy to clipboard operation
litmus copied to clipboard

x509: certifcate signed by unknown authority error when starting subscriber

Open johnqa opened this issue 2 years ago • 16 comments

What happened:

I have installed Litmus with Helm chart, and logged in the Portal.

The self-agent is i Pending, and the pod for litmusportal-subscriber fails with error "failed to confirm cluster", "Post "https://litmusdns/backend/query": x509: certifcate signed by unknown authority

My ingress spec looks like this:

spec:
  rules:
  - host: litmusdns
    http:
      paths:
      - backend:
          service:
            name: litmus-frontend-service
            port:
              number: 9091
        path: /
        pathType: ImplementationSpecific
      - backend:
          service:
            name: litmus-server-service
            port:
              number: 9002
        path: /backend/(.*)
        pathType: ImplementationSpecific
  tls:
  - hosts:
    - litmusdns

What can be the problem for this error?

Thank you, John

johnqa avatar Mar 25 '22 14:03 johnqa

@johnqa if you are using custom domains/hosts with self signed certs you need to configure litmus with either the tls cert or use the SSL skip feature to skip ssl/tls verification. Or you can remove tls if you don't have a certificate configured.

gdsoumya avatar Mar 25 '22 14:03 gdsoumya

So I added SKIP_SSL_VERIFY to Subscriber deployment but now I have another error:

required key ACCESS_KEY missing value

johnqa avatar Mar 28 '22 09:03 johnqa

Is there a secret resource named agent-secret present in the agent ns? That should have the access key

gdsoumya avatar Mar 28 '22 09:03 gdsoumya

yes, the secret is there, but what can I do with it?

johnqa avatar Mar 28 '22 09:03 johnqa

kubectl get secret agent-secret -n <ns> -oyaml and share the output

gdsoumya avatar Mar 28 '22 09:03 gdsoumya

I have added to deployment config ACCESS_KEY and CLUSTER_ID, but now I have another error:

level=fatal msg="failed to parse cluster confirm data" data="<html>\r\n<head><title>405 Not Allowed</title></head>\r\n<body>\r\n<center><h1>405 Not Allowed</h1></center>\r\n<hr><center>nginx/1.21.6</center>\r\n</body>\r\n</html>\r\n" error="invalid character '<' looking for beginning of value"

johnqa avatar Mar 28 '22 09:03 johnqa

Can you try to do a fresh install with the skip SSL env var set from the very beginning in the manifest? I think there might be some issues in the manual changes

gdsoumya avatar Mar 28 '22 09:03 gdsoumya

I am deploying using litmus helm chart, and I don't see where in values.yaml I can put these values for subscriber.

johnqa avatar Mar 28 '22 09:03 johnqa

Use this block to add any arbitrary envs for the server https://github.com/litmuschaos/litmus-helm/blob/cdfc397e0e3795ad62266eaf12b6027f2a38759e/charts/litmus/values.yaml#L192

gdsoumya avatar Mar 28 '22 09:03 gdsoumya

Just add SKIP_SSL_VERIFY: "true" in the generic block

gdsoumya avatar Mar 28 '22 09:03 gdsoumya

I did it and the current error is:

level=fatal msg="failed to confirm cluster" data= error="Post \"http://litmus.dnsname.int/backend/query\": dial tcp 10.238.40.210:80: i/o timeout"

johnqa avatar Mar 28 '22 10:03 johnqa

Can you see if you can curl/wget that url from inside the cluster network? Maybe just start a bash pod in the cluster and try accessing that URL, if it doesn't work then there's some networking or domain setup issue

gdsoumya avatar Mar 29 '22 14:03 gdsoumya

Using curl I was not able to connect to http://litmus.dnsname.int/backend/query but i was able to connect to https://litmus.dnsname.int/backend/query

I have changed the ingress settings to have https instead of http and redeployed, but now the subscriber has again the error:

level=fatal msg="failed to parse cluster confirm data" data="<html>\r\n<head><title>405 Not Allowed</title></head>\r\n<body>\r\n<center><h1>405 Not Allowed</h1></center>\r\n<hr><center>nginx/1.21.6</center>\r\n</body>\r\n</html>\r\n" error="invalid character '<' looking for beginning of value"

johnqa avatar Mar 30 '22 09:03 johnqa

@johnqa to unblock yourself for now you can just update the URL to http://litmusportal-server-service:9002/query for the self-agent and continue. Also can you check the logs of the graphql server when the subscriber throws that error

gdsoumya avatar Mar 30 '22 10:03 gdsoumya

Using http://litmus-server-service:9002/query finally worked.

Now I am worried when I will have to add an external agent :)

Thank you, John

johnqa avatar Mar 30 '22 11:03 johnqa

Awesome so it's confirmed that the problem is with the domain name/tls cert settings. Imo if it is possible for you to just disable tls in ingress and try with http I think things should work fine.

gdsoumya avatar Mar 30 '22 12:03 gdsoumya