cluster-api-operator icon indicating copy to clipboard operation
cluster-api-operator copied to clipboard

Installing the helm chart without the --wait parameter fails

Open zioproto opened this issue 1 year ago • 5 comments

What steps did you take and what happened:

minikube delete
minikube start
helm repo add jetstack https://charts.jetstack.io
helm repo add cluster-api-operator https://kubernetes-sigs.github.io/cluster-api-operator
helm repo update
helm upgrade cert-manager jetstack/cert-manager \
    --install \
    --create-namespace \
    --wait \
    --namespace cert-manager \
    --set installCRDs=true

helm install capi cluster-api-operator/cluster-api-operator \
    --set infrastructure=azure \
    --set addon=helm \
    --set image.manager.tag=v0.8.1 \
    --debug

Fails with the following output

Error: INSTALLATION FAILED: failed post-install: warning: Hook post-install cluster-api-operator/templates/infra.yaml failed: 1 error occurred:
	* Internal error occurred: failed calling webhook "vinfrastructureprovider.kb.io": failed to call webhook: Post "https://capi-operator-webhook-service.default.svc:443/mutate-operator-cluster-x-k8s-io-v1alpha2-infrastructureprovider?timeout=10s": dial tcp 10.103.91.243:443: connect: connection refused


helm.go:84: [debug] failed post-install: warning: Hook post-install cluster-api-operator/templates/infra.yaml failed: 1 error occurred:
	* Internal error occurred: failed calling webhook "vinfrastructureprovider.kb.io": failed to call webhook: Post "https://capi-operator-webhook-service.default.svc:443/mutate-operator-cluster-x-k8s-io-v1alpha2-infrastructureprovider?timeout=10s": dial tcp 10.103.91.243:443: connect: connection refused


INSTALLATION FAILED
main.newInstallCmd.func2
	helm.sh/helm/v3/cmd/helm/install.go:154
github.com/spf13/cobra.(*Command).execute
	github.com/spf13/[email protected]/command.go:983
github.com/spf13/cobra.(*Command).ExecuteC
	github.com/spf13/[email protected]/command.go:1115
github.com/spf13/cobra.(*Command).Execute
	github.com/spf13/[email protected]/command.go:1039
main.main
	helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
	runtime/proc.go:267
runtime.goexit
	runtime/asm_arm64.s:1197

What did you expect to happen: I expected the helm chart to install successfully

Anything else you would like to add:

If you add the --wait parameter it works

helm uninstall capi
helm install capi cluster-api-operator/cluster-api-operator \
    --set infrastructure=azure \
    --set addon=helm \
    --set image.manager.tag=v0.8.1 \
    --debug \
    --wait

Environment:

  • Cluster-api-operator version: v0.8.1
  • Cluster-api version:
  • Minikube/KIND version: v1.32.0
  • Kubernetes version: (use kubectl version): v1.28.3
  • OS (e.g. from /etc/os-release):

/kind bug [One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-operator/labels?q=area for the list of labels]

zioproto avatar Jan 23 '24 19:01 zioproto

This issue is currently awaiting triage.

If CAPI Operator contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jan 23 '24 19:01 k8s-ci-robot

I'm not sure that we can fix this, controllers need to start for webhooks to work. Only after controller is up operator CRs can be applied.

alexander-demicev avatar Mar 13 '24 11:03 alexander-demicev

Not the cleanest of all approaches but we could create a Job with a check (e.g. curl) if the webhook works yet. If so the Job terminates. The job than has to be installed with a post-install weight that is lower than the one that failed.


In case a desperate flux user is reading this: hr.spec.persistentClient=false seems to do the trick if you have the same issue when trying to deploy via flux helmreleasecontroller.

mxmxchere avatar May 16 '24 20:05 mxmxchere

Is this something that we want to fix? I verified that creating a Job with a webhook check solves this problem: https://github.com/kubernetes-sigs/cluster-api-operator/compare/main...willie-yao:cluster-api-operator:no-wait?expand=1

Not sure if this is something we want to include and maintain as part of the chart though. I've also added warnings about the --wait flag being required in #550.

willie-yao avatar Jun 10 '24 21:06 willie-yao

If you add the --wait parameter it works

Probably we can close it as the description describes referring to #550 as a fix. @zioproto WDYT?

furkatgofurov7 avatar Jun 11 '24 09:06 furkatgofurov7

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Sep 09 '24 09:09 k8s-triage-robot

/close

Probably we can close it as the description describes referring to https://github.com/kubernetes-sigs/cluster-api-operator/pull/550 as a fix. @zioproto WDYT?

I'll close this now since it's marked as stale and it seems like we can agree #550 is a fix. I can re-open if anyone disagrees!

willie-yao avatar Sep 10 '24 17:09 willie-yao

@willie-yao: Closing this issue.

In response to this:

/close

Probably we can close it as the description describes referring to https://github.com/kubernetes-sigs/cluster-api-operator/pull/550 as a fix. @zioproto WDYT?

I'll close this now since it's marked as stale and it seems like we can agree #550 is a fix. I can re-open if anyone disagrees!

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Sep 10 '24 17:09 k8s-ci-robot