chaos-mesh icon indicating copy to clipboard operation
chaos-mesh copied to clipboard

[Bug]: TLS certificate error from webhook prevents workflow creation until redeployment

Open vyatyos opened this issue 2 months ago • 2 comments

Chaos Mesh Version

v2.7.3

Kubernetes Version

v1.33

Describe the bug

When deploying Chaos-Mesh via ArgoCD (standalone application, default Helm chart with custom values.yaml), workflows cannot be created due to an internal webhook TLS certificate error. My workloads are on EKS, applications are deployed with ArgoCD, and Chaos-Mesh uses containerd as the runtime.

Querying kubectl get secrets in the chaos-mesh namespace correctly yields 4 certificates (chaosd-client-certs, daemon-certs, daemon-client-certs, webhook-certs).

Is there anything we could be missing in our Chaos-Mesh configuration?

To Reproduce

  1. Deploy Chaos-Mesh using the Helm chart from the repo and the values.yaml below.

chaos-mesh deployment values.yaml

chaosDaemon:
  runtime: containerd 
  socketPath: /run/containerd/containerd.sock

controllerManager:
  replicaCount: 1

dashboard:
  securityMode: false

  persistentVolume:
    enabled: true
    storageClassName: efs
    size: 1Gi

  ingress:
    enabled: true
    hosts:
      - name: <redacted>
    ingressClassName: nginx
  1. Wait for Chaos-Mesh to be fully deployed.
  2. Attempt to deploy a Chaos-Mesh workflow deployment.

Error:

Failed sync attempt to : one or more objects failed to apply, reason: Internal error occurred: failed calling webhook "mworkflow.kb.io": failed to call webhook: Post "https://chaos-mesh-controller-manager.chaos-mesh.svc:443/mutate-chaos-mesh-org-v1alpha1-workflow?... tls: failed to verify certificate: x509: certificate signed by unknown authority"

vyatyos avatar Sep 29 '25 14:09 vyatyos

Hi @vyatyos , Chaos Mesh would use the self-signed certificates by default, which generated by helm;

https://helm.sh/docs/chart_template_guide/function_list/#cryptographic-and-security-functions

you could find the usage of genCA in the helm chart: helm/chaos-mesh/templates/_certs.tpl

so it seems there might be some security polices that do not trust this cert, and it should be fixed with the security team, maybe you should use a real "trusted certs" instead of generated self signed certifactes;

STRRL avatar Sep 30 '25 01:09 STRRL

also, please take a look this configuration if you're using cert-manager

webhook.certManager

STRRL avatar Sep 30 '25 01:09 STRRL