Helm Operator occasionally fails to fetch chart dependencies
Describe the bug
From time to time, helm-operator fails to pull the chart dependencies of a chart loaded from a git repository.
Describing the helm-release yields the following error message:
synchronization of release '<chart_from_git>' in namespace '<dev-namespace>' failed: failed to prepare chart for release: could not find : chart redis not found in https://charts.bitnami.com/bitnami
Our chart has two dependencies: redis and postgresql. It seems as if booth are equally likely to fail. The logs of the helm operator show this: ( newest entry is first ).
ts=2020-06-09T07:11:13.887767314Z caller=logwriter.go:28 component=helm version=v3 info="Deleting newly downloaded charts, restoring pre-update state"
ts=2020-06-09T07:11:13.887716888Z caller=logwriter.go:28 component=helm version=v3 info="Save error occurred: could not find : chart postgresql not found in https://charts.bitnami.com/bitnami"
ts=2020-06-09T07:11:13.506827155Z caller=logwriter.go:28 component=helm version=v3 info="Downloading postgresql from repo https://charts.bitnami.com/bitnami"
ts=2020-06-09T07:11:13.506790379Z caller=logwriter.go:28 component=helm version=v3 info="Saving 2 charts"
ts=2020-06-09T07:11:13.495880613Z caller=logwriter.go:28 component=helm version=v3 info="Downloading postgresql from repo https://charts.bitnami.com/bitnami"
ts=2020-06-09T07:11:13.495851862Z caller=logwriter.go:28 component=helm version=v3 info="Saving 2 charts"
ts=2020-06-09T07:11:13.483137413Z caller=logwriter.go:28 component=helm version=v3 info="Downloading postgresql from repo https://charts.bitnami.com/bitnami"
ts=2020-06-09T07:11:13.481584789Z caller=logwriter.go:28 component=helm version=v3 info="Saving 2 charts"
ts=2020-06-09T07:11:13.474450824Z caller=logwriter.go:28 component=helm version=v3 info="Downloading postgresql from repo https://charts.bitnami.com/bitnami"
ts=2020-06-09T07:11:13.472042753Z caller=logwriter.go:28 component=helm version=v3 info="Saving 2 charts"
The weird thing is, that it usually takes ~5 minutes, after which updating the dependencies work fine. I was thinking that it might be a network error, but i can not find any proof of another pod having network issues when these errors occur. To be save, i've started a batch job that just pulls the chart and runs helm dep build every 5 minutes, which hasn't failed once over the last 2 days.
Right now, i'm out of ideas. Can you help me debug this issue further?
To Reproduce I'm pretty unsure how to reproduce this, but this is our setup Steps to reproduce the behaviour: 0. We have the following setup: Kubernetes running in AWS with EKS, version 1.15, helm operator version 1.1.0
- We have a chart in git that uses the following requirements.yaml:
dependencies:
- name: postgresql
version: ~8.9.x
condition: dbConfig.internal.enabled
repository: https://charts.bltnami.com/bitnami
- name: redis
version: ~10.6.x
condition: redisConfig.internal.enabled
repository: https://charts.bitnami.com/bitnami
- Our helmrelease.yaml looks like this: (i've omitted all the annotations and changed names)
---
apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
name: dev-release
namespace: "dev-namespace"
spec:
releaseName: dev
chart:
git: [email protected]:company/repo.git
path: k8s/helm/chart
ref: development
values:
....
- Wait for everything to be deployed
Expected behavior I expect the dependency update to work all of the time.
Additional context Add any other context about the problem here, e.g
- Helm Operator version: 1.1.0
- Targeted Helm version: 3
- Kubernetes version:
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.11-eks-af3caf", GitCommit:"af3caf6136cd355f467083651cc1010a499f59b1", GitTreeState:"clean", BuildDate:"2020-03-27T21:51:36Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"} - Git provider: bitbucket
- Container registry provider: AWS ECR
Seeing the same behavior on our end.
are there any plans to fix this bug? @yebyen @stefanprodan
@hsharma96 Thanks for your report!
This issue report has been filed for a while, and Helm Operator has seen a new release in the past few days. I see lots of thumbs up on this report, so I don't doubt you're having the same issue. But can you please provide details about your configuration, and what parts exactly is in common with this report / which details may differ?
What version of Helm Operator are you using, what type of cluster is it deployed on. (Have you been able to reproduce the issue with the more actively developed Helm Controller...) Is it the same specific Bitnami helm repo that fails, or others too?
The Bitnami repo is known to cause trouble because it is large (like the "stable" repo that was deprecated from official Helm maintainers, in favor of the more de-centralized model of "app maintainers should host their own Helm Repo") and so it may not be fully synced within the default timeout period. Having a look through the Helm Operator docs, I don't see any param to configure this timeout value. There is a chartSync interval, a git poll interval, and a git timeout value, but no chart sync timeout value.
I think it would be pretty trivial to add a configurable timeout, but as our interest is in advancing the latest version of Flux and eventually ending the maintenance of Helm Operator, which is pretty well superseded now, I'm a lot more interested in what's stopping you from upgrading to Flux v2, if you have not been able to do so yet.
The HelmRelease from our more modern Helm Controller, in Flux v2, is driven by a separate HelmRepo resource that comes configurable with its own timeout parameters and, though it still suffers from this issue (large repos are slow to sync) it can be mitigated by adjusting the timeout value on the specific repo that gives you trouble. So I'm hoping that if you can upgrade, this problem goes away for you without any further changes or releases on the Helm Operator project.
Sorry if your issue remains unresolved. The Helm Operator is in maintenance mode, we recommend everybody upgrades to Flux v2 and Helm Controller.
A new release of Helm Operator is out this week, 1.4.4.
We will continue to support Helm Operator in maintenance mode for an indefinite period of time, and eventually archive this repository.
Please be aware that Flux v2 has a vibrant and active developer community who are actively working through minor releases and delivering new features on the way to General Availability for Flux v2.
In the mean time, this repo will still be monitored, but support is basically limited to migration issues only. I will have to close many issues today without reading them all in detail because of time constraints. If your issue is very important, you are welcome to reopen it, but due to staleness of all issues at this point a new report is more likely to be in order. Please open another issue if you have unresolved problems that prevent your migration in the appropriate Flux v2 repo.
Helm Operator releases will continue as possible for a limited time, as a courtesy for those who still cannot migrate yet, but these are strongly not recommended for ongoing production use as our strict adherence to semver backward compatibility guarantees limit many dependencies and we can only upgrade them so far without breaking compatibility. So there are likely known CVEs that cannot be resolved.
We recommend upgrading to Flux v2 which is actively maintained ASAP.
I am going to go ahead and close every issue at once today, Thanks for participating in Helm Operator and Flux! 💚 💙