helm-operator icon indicating copy to clipboard operation
helm-operator copied to clipboard

Randomly "failed to prepare chart for release: chart unavailable error"

Open mbuccini opened this issue 4 years ago • 2 comments

Describe the bug

Hello,

we are using Helm-operator installed through the chart (v1.2.0) to install 5 custom releases in a specific namespace, tenant-namespace. After the creation of this namespace, we add 5 HelmRelease[s]. Right afterwards, we start the helm-operator (in the helm-operator-namespace) with the flag allowNamespace=tenant-namespace.

When the helm-operator starts, we can see the following logged lines:

{"caller":"repository.go:116","component":"helm","info":"successfully imported repository","name":"REPO-NAME","ts":"2020-09-21T12:21:34.131091787Z","url":"https://HOST/REPOSITORY","version":"v3"}
{"caller":"operator.go:86","component":"operator","info":"setting up event handlers","ts":"2020-09-21T12:21:34.131522693Z"}
{"caller":"operator.go:107","component":"operator","info":"event handlers set up","ts":"2020-09-21T12:21:34.131549858Z"}
{"caller":"main.go:300","component":"helm-operator","info":"waiting for informer caches to sync","ts":"2020-09-21T12:21:34.131561111Z"}
{"caller":"main.go:305","component":"helm-operator","info":"informer caches synced","ts":"2020-09-21T12:21:34.231668816Z"}
{"caller":"operator.go:119","component":"operator","info":"starting operator","ts":"2020-09-21T12:21:34.231754463Z"}
{"caller":"operator.go:121","component":"operator","info":"starting workers","ts":"2020-09-21T12:21:34.231817931Z"}
{"caller":"git.go:104","component":"gitchartsync","info":"starting sync of git chart sources","ts":"2020-09-21T12:21:34.23185209Z"}
{"caller":"server.go:42","component":"daemonhttp","info":"starting HTTP server on :3030","ts":"2020-09-21T12:21:34.231932341Z"}
{"caller":"release.go:79","component":"release","helmVersion":"v3","info":"starting sync run","release":"helmrelease-1","resource":"tenant-namespace:helmrelease/helmrelease-1","targetNamespace":"tenant-namespace","ts":"2020-09-21T12:21:34.232395503Z"}
{"caller":"release.go:79","component":"release","helmVersion":"v3","info":"starting sync run","release":"helmrelease-2","resource":"tenant-namespace:helmrelease/helmrelease-2","targetNamespace":"tenant-namespace","ts":"2020-09-21T12:21:34.232407144Z"}
{"caller":"release.go:79","component":"release","helmVersion":"v3","info":"starting sync run","release":"helmrelease-5","resource":"tenant-namespace:helmrelease/helmrelease-5","targetNamespace":"tenant-namespace","ts":"2020-09-21T12:21:34.232474919Z"}
{"caller":"release.go:79","component":"release","helmVersion":"v3","info":"starting sync run","release":"helmrelease-3","resource":"tenant-namespace:helmrelease/helmrelease-3","targetNamespace":"tenant-namespace","ts":"2020-09-21T12:21:34.232668217Z"}
{"caller":"release.go:79","component":"release","helmVersion":"v3","info":"starting sync run","release":"helmrelease-4","resource":"tenant-namespace:helmrelease/helmrelease-4","targetNamespace":"tenant-namespace","ts":"2020-09-21T12:21:34.23272295Z"}
{"caller":"checkpoint.go:24","component":"checkpoint","latest":"0.10.1","msg":"up to date","ts":"2020-09-21T12:21:34.407163274Z"}
{"caller":"release.go:85","component":"release","error":"failed to prepare chart for release: chart unavailable: no cached repo found. (try 'helm repo update'): no API version specified","helmVersion":"v3","release":"helmrelease-3","resource":"tenant-namespace:helmrelease/helmrelease-3","targetNamespace":"tenant-namespace","ts":"2020-09-21T12:21:42.660684634Z"}
{"caller":"release.go:313","component":"release","helmVersion":"v3","info":"running installation","phase":"install","release":"release-1","resource":"helmrelease-1","targetNamespace":"tenant-namespace","ts":"2020-09-21T12:21:47.066431409Z"}
{"caller":"release.go:313","component":"release","helmVersion":"v3","info":"running installation","phase":"install","release":"helmrelease-2","resource":"tenant-namespace:helmrelease/helmrelease-2","targetNamespace":"tenant-namespace","ts":"2020-09-21T12:21:47.564188622Z"}
{"caller":"release.go:313","component":"release","helmVersion":"v3","info":"running installation","phase":"install","release":"helmrelease-4","resource":"tenant-namespace:helmrelease/helmrelease-4","targetNamespace":"tenant-namespace","ts":"2020-09-21T12:21:47.605563963Z"}
{"caller":"helm.go:69","component":"helm","info":"creating 6 resource(s)","release":"helmrelease-1","targetNamespace":"tenant-namespace","ts":"2020-09-21T12:21:47.66017291Z","version":"v3"}

It seems that in some cases it is able to install the releases, while in one case it fails to find the chart. I can tell you that the chart is in the index, and the weird thing is that sometimes it works for some charts, sometimes it works for other charts. Sometimes it also works for all the charts.

We also had some cases where the second sync failed, meaning: at the startup, during the first sync, two releases failed. At the second sync, one of those that had previously failed was actually installed correctly, while the other failed again, just to be successfully installed on the third sync.

We even did some performance tests (with 150 helm-operators) and it happens ~ 90% of the time that it can't install one release or another, or a mix of 2-3 releases. In only 10% of the cases it works without issues.

We set the following properties in the helm-operator chart:

 values:
    updateChartDeps: false
    chartsSyncInterval: 5m
    statusUpdateInterval: 2m
    workers: 6
    logFormat: json
    helm:
      versions: v3
    resources:
      requests:
        memory: 64Mi
        cpu: 100m
      limits:
        memory: 1500Mi
        cpu: 1500m

    allowNamespace: [[tenantNamespace]]

    configureRepositories:
      enable: true
      volumeName: repositories-config

Last piece of information that might help: the index file is ~ 10MB. Can this be the reason why it fails to load a chart? I dug into the code, and it seems that this is the function that triggers the failure:

func loadIndex(data []byte) (*IndexFile, error) {
	i := &IndexFile{}
	if err := yaml.UnmarshalStrict(data, i); err != nil {
		return i, err
	}
	i.SortEntries()
	if i.APIVersion == "" {
		return i, ErrNoAPIVersion
	}
	return i, nil
}

The unmarshalling, though, doesn't seem to be the cause, because otherwise it would stop there just before the ErroNoAPIVersion. I wonder if the cause is that we have 5 concurrent releases and this "big" file is being read/sorted in memory on each release, so maybe some weird stuff happens?

If you need more info, please let me know, I will share anything I can.

To Reproduce

I can't provide an archive to reproduce, but you need to have

  • a relatively big index file (~10MB),
  • two separate namespaces where the helm-operator runs in one and the HelmReleases are in the other namespace
  • 6 workers ( or at least 5)
  • create the 5 HelmRelease resources in the namespace before helm-operator starts

Expected behavior

I expect that all the releases are correctly installed without errors (as it happens in 10% of the cases).

Logs See above.

Additional context Add any other context about the problem here, e.g

  • Helm Operator version: 1.2.0
  • Targeted Helm version:
version.BuildInfo{Version:"v3.1.2", GitCommit:"d878d4d45863e42fd5cff6743294a11d28a9abce", GitTreeState:"clean", GoVersion:"go1.13.8"}
  • Kubernetes version:
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.7", GitCommit:"6c143d35bb11d74970e7bc0b6c45b6bfdffc0bd4", GitTreeState:"clean", BuildDate:"2019-12-11T12:42:56Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.13-eks-2ba888", GitCommit:"2ba888155c7f8093a1bc06e3336333fbdb27b3da", GitTreeState:"clean", BuildDate:"2020-07-17T18:48:53Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
  • Git provider:
  • Container registry provider:

mbuccini avatar Sep 23 '20 09:09 mbuccini

After doing some more tests and trying to fix here and there, we were able to solve the issue by using a Helm Charts Repository with a smaller index (so, less items). Apparently, helm-operator doesn't like when there are too many items in the index, so in case you receive the same error, I'd recommend you to find a way to use a different helm charts repository with less items. We didn't get any error with an index size of just a few KBs, but we can't tell when this issue starts to appear or when the helm-operator starts to degrade.

If you want, feel free to close the item. Although the issue is not solved, there is at least a workaround for it.

mbuccini avatar Sep 29 '20 14:09 mbuccini

We have run into this same issue and I believe it's a race condition updating/reading the helm index file. I can consistently replicate with these steps:

  1. configure multiple charts to be deployed (we can replicate with 7 charts)
  2. set worker threads equal to number of charts
  3. run the following script to clear caches and trigger upgrade (may have to run repeatedly, may be dependent on helm index.yaml file size)
# this clears the index and chart caches on the operator
kubectl exec {HELM_OPERATOR_POD} -- /bin/bash -c "rm -rf {REPO_INDEX_PATH}.yaml && rm -rf /tmp/v3/*"

# this updates a timestamp in the HelmRelease.spec.values, then applies the spec
# triggering a helm install or upgrade of the configured charts
sed "s/REPLACE_TIMESTAMP/$(date)/g" "{HELM_RELEASE_SPECS}.yml" | kubectl apply -f - 

This sequence replicates the following error:

ts=2021-03-30T15:54:51.79307505Z caller=release.go:85 component=release release=redacted targetNamespace=redacted resource=redacted helmVersion=v3 error="failed to prepare chart for release: chart unavailable: no cached repo found. (try 'helm repo update'): no API version specified"

I believe this is because the index download/write and reads are not protected by a mutex and there can be conflicts when multiple workers have chart/index misses and trigger an index download at the same time. One worker downloads the index, proceeds to install/upgrade, but fails when reading the index because it is in the process of being written by a different worker.

limscoder avatar Mar 30 '21 16:03 limscoder

Sorry if your issue remains unresolved. The Helm Operator is in maintenance mode, we recommend everybody upgrades to Flux v2 and Helm Controller.

A new release of Helm Operator is out this week, 1.4.4.

We will continue to support Helm Operator in maintenance mode for an indefinite period of time, and eventually archive this repository.

Please be aware that Flux v2 has a vibrant and active developer community who are actively working through minor releases and delivering new features on the way to General Availability for Flux v2.

In the mean time, this repo will still be monitored, but support is basically limited to migration issues only. I will have to close many issues today without reading them all in detail because of time constraints. If your issue is very important, you are welcome to reopen it, but due to staleness of all issues at this point a new report is more likely to be in order. Please open another issue if you have unresolved problems that prevent your migration in the appropriate Flux v2 repo.

Helm Operator releases will continue as possible for a limited time, as a courtesy for those who still cannot migrate yet, but these are strongly not recommended for ongoing production use as our strict adherence to semver backward compatibility guarantees limit many dependencies and we can only upgrade them so far without breaking compatibility. So there are likely known CVEs that cannot be resolved.

We recommend upgrading to Flux v2 which is actively maintained ASAP.

I am going to go ahead and close every issue at once today, Thanks for participating in Helm Operator and Flux! 💚 💙

kingdonb avatar Sep 02 '22 19:09 kingdonb