[Bug]: TLS certificate error from webhook prevents workflow creation until redeployment
Chaos Mesh Version
v2.7.3
Kubernetes Version
v1.33
Describe the bug
When deploying Chaos-Mesh via ArgoCD (standalone application, default Helm chart with custom values.yaml), workflows cannot be created due to an internal webhook TLS certificate error. My workloads are on EKS, applications are deployed with ArgoCD, and Chaos-Mesh uses containerd as the runtime.
Querying kubectl get secrets in the chaos-mesh namespace correctly yields 4 certificates (chaosd-client-certs, daemon-certs, daemon-client-certs, webhook-certs).
Is there anything we could be missing in our Chaos-Mesh configuration?
To Reproduce
- Deploy Chaos-Mesh using the Helm chart from the repo and the values.yaml below.
chaos-mesh deployment values.yaml
chaosDaemon:
runtime: containerd
socketPath: /run/containerd/containerd.sock
controllerManager:
replicaCount: 1
dashboard:
securityMode: false
persistentVolume:
enabled: true
storageClassName: efs
size: 1Gi
ingress:
enabled: true
hosts:
- name: <redacted>
ingressClassName: nginx
- Wait for Chaos-Mesh to be fully deployed.
- Attempt to deploy a Chaos-Mesh workflow deployment.
Error:
Failed sync attempt to : one or more objects failed to apply, reason: Internal error occurred: failed calling webhook "mworkflow.kb.io": failed to call webhook: Post "https://chaos-mesh-controller-manager.chaos-mesh.svc:443/mutate-chaos-mesh-org-v1alpha1-workflow?... tls: failed to verify certificate: x509: certificate signed by unknown authority"
Hi @vyatyos , Chaos Mesh would use the self-signed certificates by default, which generated by helm;
https://helm.sh/docs/chart_template_guide/function_list/#cryptographic-and-security-functions
you could find the usage of genCA in the helm chart: helm/chaos-mesh/templates/_certs.tpl
so it seems there might be some security polices that do not trust this cert, and it should be fixed with the security team, maybe you should use a real "trusted certs" instead of generated self signed certifactes;
also, please take a look this configuration if you're using cert-manager
webhook.certManager