cluster-api-operator
cluster-api-operator copied to clipboard
Installing the helm chart without the --wait parameter fails
What steps did you take and what happened:
minikube delete
minikube start
helm repo add jetstack https://charts.jetstack.io
helm repo add cluster-api-operator https://kubernetes-sigs.github.io/cluster-api-operator
helm repo update
helm upgrade cert-manager jetstack/cert-manager \
--install \
--create-namespace \
--wait \
--namespace cert-manager \
--set installCRDs=true
helm install capi cluster-api-operator/cluster-api-operator \
--set infrastructure=azure \
--set addon=helm \
--set image.manager.tag=v0.8.1 \
--debug
Fails with the following output
Error: INSTALLATION FAILED: failed post-install: warning: Hook post-install cluster-api-operator/templates/infra.yaml failed: 1 error occurred:
* Internal error occurred: failed calling webhook "vinfrastructureprovider.kb.io": failed to call webhook: Post "https://capi-operator-webhook-service.default.svc:443/mutate-operator-cluster-x-k8s-io-v1alpha2-infrastructureprovider?timeout=10s": dial tcp 10.103.91.243:443: connect: connection refused
helm.go:84: [debug] failed post-install: warning: Hook post-install cluster-api-operator/templates/infra.yaml failed: 1 error occurred:
* Internal error occurred: failed calling webhook "vinfrastructureprovider.kb.io": failed to call webhook: Post "https://capi-operator-webhook-service.default.svc:443/mutate-operator-cluster-x-k8s-io-v1alpha2-infrastructureprovider?timeout=10s": dial tcp 10.103.91.243:443: connect: connection refused
INSTALLATION FAILED
main.newInstallCmd.func2
helm.sh/helm/v3/cmd/helm/install.go:154
github.com/spf13/cobra.(*Command).execute
github.com/spf13/[email protected]/command.go:983
github.com/spf13/cobra.(*Command).ExecuteC
github.com/spf13/[email protected]/command.go:1115
github.com/spf13/cobra.(*Command).Execute
github.com/spf13/[email protected]/command.go:1039
main.main
helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
runtime/proc.go:267
runtime.goexit
runtime/asm_arm64.s:1197
What did you expect to happen: I expected the helm chart to install successfully
Anything else you would like to add:
If you add the --wait
parameter it works
helm uninstall capi
helm install capi cluster-api-operator/cluster-api-operator \
--set infrastructure=azure \
--set addon=helm \
--set image.manager.tag=v0.8.1 \
--debug \
--wait
Environment:
- Cluster-api-operator version: v0.8.1
- Cluster-api version:
- Minikube/KIND version: v1.32.0
- Kubernetes version: (use
kubectl version
): v1.28.3 - OS (e.g. from
/etc/os-release
):
/kind bug [One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-operator/labels?q=area for the list of labels]
This issue is currently awaiting triage.
If CAPI Operator contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
I'm not sure that we can fix this, controllers need to start for webhooks to work. Only after controller is up operator CRs can be applied.
Not the cleanest of all approaches but we could create a Job with a check (e.g. curl) if the webhook works yet. If so the Job terminates. The job than has to be installed with a post-install weight that is lower than the one that failed.
In case a desperate flux user is reading this: hr.spec.persistentClient=false
seems to do the trick if you have the same issue when trying to deploy via flux helmreleasecontroller.
Is this something that we want to fix? I verified that creating a Job with a webhook check solves this problem: https://github.com/kubernetes-sigs/cluster-api-operator/compare/main...willie-yao:cluster-api-operator:no-wait?expand=1
Not sure if this is something we want to include and maintain as part of the chart though. I've also added warnings about the --wait
flag being required in #550.
If you add the --wait parameter it works
Probably we can close it as the description describes referring to #550 as a fix. @zioproto WDYT?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/close
Probably we can close it as the description describes referring to https://github.com/kubernetes-sigs/cluster-api-operator/pull/550 as a fix. @zioproto WDYT?
I'll close this now since it's marked as stale and it seems like we can agree #550 is a fix. I can re-open if anyone disagrees!
@willie-yao: Closing this issue.
In response to this:
/close
Probably we can close it as the description describes referring to https://github.com/kubernetes-sigs/cluster-api-operator/pull/550 as a fix. @zioproto WDYT?
I'll close this now since it's marked as stale and it seems like we can agree #550 is a fix. I can re-open if anyone disagrees!
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.