helm-operator icon indicating copy to clipboard operation
helm-operator copied to clipboard

Constantly creating new releases for charts even when no changes

Open brpaz opened this issue 5 years ago • 51 comments
trafficstars

Describe the bug

Hello.

I am starting experimenting with Flux and the Helm Operator on a new Cluster and everything went fine until I deployed cert-manager Helm chart.

Each time the sync runs, the helm operator tries to do an update and create a new release even without any change in the chart.

This is causing some instability in the Network of my cluster. (maybe to do excessive load in the API server resulted from the constant updates.

What is strange is that if I run: kubectl -n cert-manager get helmreleases.helm.fluxcd.io the latest update date is the initial deploy. Still a new secret with the helm release information is being created every time and the pods restarted.

To Reproduce

Just a basic helm release manifest:

---
apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
  name: cert-manager
  namespace: cert-manager
spec:
  releaseName: cert-manager
  chart:
    repository: https://charts.jetstack.io
    name: cert-manager
    version: 0.15.1
  values:
    installCRDs: true
    global:
      leaderElection:
        namespace: cert-manager
    ingressShim:
      defaultIssuerName: letsencrypt-prod
      defaultIssuerKind: ClusterIssuer
    prometheus:
      enabled: false

Expected behavior

Cert-manager should only be deployed when the is any change.

Logs

Here is the log output:

´s=2020-06-09T11:27:43.485024459Z caller=helm.go:69 component=helm version=v3 info="checking 31 resources for changes" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:43.511728151Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for ServiceAccount \"cert-manager\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:43.52932945Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for ServiceAccount \"cert-manager-webhook\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:43.593238042Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for CustomResourceDefinition \"certificaterequests.cert-manager.io\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:43.707520153Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for CustomResourceDefinition \"certificates.cert-manager.io\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:43.787600969Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for CustomResourceDefinition \"challenges.acme.cert-manager.io\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:43.840798706Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for CustomResourceDefinition \"clusterissuers.cert-manager.io\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:43.914200336Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for CustomResourceDefinition \"issuers.cert-manager.io\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:43.976102766Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for CustomResourceDefinition \"orders.acme.cert-manager.io\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:44.006756511Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for ClusterRole \"cert-manager-controller-issuers\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:44.170684941Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for ClusterRole \"cert-manager-controller-clusterissuers\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:44.186528407Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for ClusterRole \"cert-manager-controller-certificates\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:44.206401365Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for ClusterRole \"cert-manager-controller-orders\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:44.226948001Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for ClusterRole \"cert-manager-controller-challenges\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:44.242952126Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for ClusterRole \"cert-manager-controller-ingress-shim\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:44.270264045Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for ClusterRole \"cert-manager-view\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:44.288023181Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for ClusterRole \"cert-manager-edit\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:44.323437971Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for ClusterRoleBinding \"cert-manager-controller-issuers\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:44.334162638Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for ClusterRoleBinding \"cert-manager-controller-clusterissuers\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:44.349806345Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for ClusterRoleBinding \"cert-manager-controller-certificates\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:44.364670113Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for ClusterRoleBinding \"cert-manager-controller-orders\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:44.42316236Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for ClusterRoleBinding \"cert-manager-controller-challenges\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:44.442380377Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for ClusterRoleBinding \"cert-manager-controller-ingress-shim\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:44.457693993Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for Role \"cert-manager:leaderelection\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:44.471976952Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for Role \"cert-manager-webhook:dynamic-serving\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:44.509698246Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for Service \"cert-manager-webhook\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:44.534244597Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for Deployment \"cert-manager\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:44.550004375Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for Deployment \"cert-manager-webhook\"" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:45.300454292Z caller=helm.go:69 component=helm version=v3 info="updating status for upgraded release for cert-manager" targetNamespace=cert-manager release=cert-manager
ts=2020-06-09T11:27:46.000638151Z caller=release.go:309 component=release release=cert-manager targetNamespace=cert-manager resource=cert-manager:helmrelease/cert-manager helmVersion=v3 info="upgrade succeeded" revision=0.15.1 phase=upgrade

Sometimes I also found these errors:

warning="failed to annotate release resources: serviceaccount/cert-manager annotated" phase=annotate

Not exactly sure what this is but it seems to take some time to run.

Additional context

  • Helm Operator version: 1.1.0 (installed with the Helm chart)
  • Kubernetes version: 1.16.8 (Digital Ocean)

brpaz avatar Jun 09 '20 13:06 brpaz

It seems version 1.1.0 constantly creates new releases for all HelmRelease definitions. Downgrading to 1.0.2 (i.e. the Helm version of helm-operator) resolved the issue for me.

mtneug avatar Jun 14 '20 14:06 mtneug

I'm facing the similar issue with strimizi kafka operator. With 1.0.2 works well, but with 1.1.0 it keeps sync the chart even though there is no change.

lucioveloso avatar Jun 16 '20 15:06 lucioveloso

I'm seeing the same issue with version 1.1.0 and cert-manager 0.15.0

surfingsimbo avatar Jun 18 '20 10:06 surfingsimbo

We're seeing the same issue with v1.1.0 and cert-manager v0.15.0.

NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION cert-manager cert-manager 244 2020-06-19 08:20:30.927272479 +0000 UTC deployed cert-manager-v0.15.0 v0.15.0

stevehipwell avatar Jun 18 '20 21:06 stevehipwell

We had the same issue, but in our case this was caused by too low memory limits for the helm operator. This has caused it to be restarted by kubernetes. After a new start of the helm operator it has to download all the charts again. This caused the chart.changed value set to true for all charts which then causes the Upgrade action to be called instead of the dry-run action.

I would prefer if the dry-run action would be called here.

twendt avatar Jun 29 '20 12:06 twendt

I'm also seeing this problem with some custom charts. Each reconcile iteration seems to create a new release upgrade even though nothing changes.

EDIT: Figured it out after doing some debugging. It looks like if we use relative chart versions e.g. ~> 2.0 in the release it will break the comparison here since the value of hr.Status.LastAttemptedRevision will not be the resolved version of the chart but rather the raw ~> 2.0 string and it will obviously fail, causing the operator to assume there's always a new version of the chart.

My issue doesn't seem to be related to the cert-manager issue described here, I think it would deserve its own ticket.

EDIT2: Relevant issue #490 (resolved in v1.2.0, my comment can be ignored).

relu avatar Jul 29 '20 09:07 relu

I am also having this issue with a lot of charts, info can be found in slack. Can we have a maintainer chime in or at least rename this issue? It can happen on any random chart from what I've seen.

I am not seeing the helm-operator pod being restarted like @twendt has.

https://cloud-native.slack.com/archives/CLAJ40HV3/p1597320334119100

onedr0p avatar Aug 13 '20 12:08 onedr0p

@onedr0p I have renamed the issue to be more generic.

Still, we need an official response. This issue is open for 2 months without any feedback and I think it´s quite critical. The constant releases killed my cluster.

I want to fully dive into GitOps, but this issue open for so long without any feedback doesnt give much confidence.

@stefanprodan, can some maintainer look at this, please?

brpaz avatar Aug 16 '20 18:08 brpaz

I too observed this today, up to revision 2291(!) of a Helm-operator (v1.2.0) controlled HelmRelease

davidholsgrove avatar Aug 17 '20 01:08 davidholsgrove

Same here. I have just completed the 'get-started' tutorial and the demo apps are upgraded every 5 minutes:

ts=2020-08-17T02:37:31.935043189Z caller=helm.go:69 component=helm version=v3 info="checking 6 resources for changes" targetNamespace=demo release=redis
ts=2020-08-17T02:37:31.94590984Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for Secret \"redis\"" targetNamespace=demo release=redis
ts=2020-08-17T02:37:31.955831337Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for ConfigMap \"redis\"" targetNamespace=demo release=redis
ts=2020-08-17T02:37:31.97589111Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for ConfigMap \"redis-health\"" targetNamespace=demo release=redis
ts=2020-08-17T02:37:31.985966338Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for Service \"redis-headless\"" targetNamespace=demo release=redis
ts=2020-08-17T02:37:31.994606993Z caller=helm.go:69 component=helm version=v3 info="Looks like there are no changes for Service \"redis-master\"" targetNamespace=demo release=redis
ts=2020-08-17T02:37:32.056348166Z caller=helm.go:69 component=helm version=v3 info="updating status for upgraded release for redis" targetNamespace=demo release=redis
ts=2020-08-17T02:37:32.213089037Z caller=release.go:364 component=release release=redis targetNamespace=demo resource=demo:helmrelease/redis helmVersion=v3 info="upgrade succeeded" revision=10.3.1 phase=upgrade

Helm list:

NAME    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART           APP VERSION
mongodb demo            22              2020-08-17 02:40:31.151206363 +0000 UTC deployed        mongodb-7.6.3   4.0.14
redis   demo            22              2020-08-17 02:40:31.296983141 +0000 UTC deployed        redis-10.3.1    5.0.7

eklee avatar Aug 17 '20 02:08 eklee

Sorry for the late response all, last months have been hectic in terms of workload and GitOps Toolkit developments, and I was enjoying time off the last two weeks.

I tried to reproduce the issue with version 1.2.0 of the Helm Operator with both the Redis HelmRelease example @eklee reported and the cert-manager HelmRelease in the issue, and was unable to observe any spurious upgrades.

@davidholsgrove may it be possible that the revision drift up to 2291 was due to the misbehaving 1.1.0 version, and thus fixed by #490? If not, can you all please share the Status object of the misbehaving HelmRelease? This would give me better insights in why it may happen.

hiddeco avatar Aug 17 '20 09:08 hiddeco

@hiddeco I've been using helm-operator v1.2.0 since 10th August, and only noticed the run away upgrades this week. Its occurring in 3 separate k8s clusters (all using fluxcd 1.4.0 / helm-operator 1.2.0, with separate backing git repos of HelmReleases).

I'm using fixed chart versions, so not the same as https://github.com/fluxcd/helm-operator/issues/469

Ive killed the fluxcd and helm-operator pods in each cluster to stop the helm history being trashed.

Cluster 1

prometheus-operator and helm-operator continually upgrading;

$ helm ls -A
NAME                            NAMESPACE               REVISION        UPDATED                                         STATUS          CHART                                                   APP VERSION
flux                            fluxcd                  29              2020-08-09 22:20:49.084079856 +0000 UTC         deployed        flux-1.4.0                                              1.20.0
helm-operator                   fluxcd                  1032            2020-08-12 04:25:04.433300225 +0000 UTC         deployed        helm-operator-1.2.0                                     1.2.0
prometheus-operator             monitoring              497             2020-08-13 07:54:39.642859 +1000 AEST           deployed        prometheus-operator-9.3.1                               0.38.1

Upgrade occurring every 3 minutes (the last one manually, after helm-operator had been stopped for a day):

$ helm -n monitoring history prometheus-operator
REVISION        UPDATED                         STATUS          CHART                           APP VERSION     DESCRIPTION
488             Wed Aug 12 04:01:19 2020        superseded      prometheus-operator-9.3.1       0.38.1          Upgrade complete
489             Wed Aug 12 04:04:09 2020        superseded      prometheus-operator-9.3.1       0.38.1          Upgrade complete
490             Wed Aug 12 04:07:10 2020        superseded      prometheus-operator-9.3.1       0.38.1          Upgrade complete
491             Wed Aug 12 04:10:11 2020        superseded      prometheus-operator-9.3.1       0.38.1          Upgrade complete
492             Wed Aug 12 04:13:18 2020        superseded      prometheus-operator-9.3.1       0.38.1          Upgrade complete
493             Wed Aug 12 04:16:15 2020        superseded      prometheus-operator-9.3.1       0.38.1          Upgrade complete
494             Wed Aug 12 04:19:14 2020        superseded      prometheus-operator-9.3.1       0.38.1          Upgrade complete
495             Wed Aug 12 04:22:09 2020        superseded      prometheus-operator-9.3.1       0.38.1          Upgrade complete
496             Wed Aug 12 04:25:18 2020        superseded      prometheus-operator-9.3.1       0.38.1          Upgrade complete
497             Thu Aug 13 07:54:39 2020        deployed        prometheus-operator-9.3.1       0.38.1          Upgrade complete
$ k -n monitoring describe hr prometheus-operator
Name:         prometheus-operator
Namespace:    monitoring
Labels:       fluxcd.io/sync-gc-mark=sha256.OfeOgFbdT4R06Z7OMl2uT9Wjy5tc0SdI9v8JqxaoqeQ
Annotations:  fluxcd.io/sync-checksum: 69091ad8e7fe97e3926e9d2256a4a42f1d87d459
              kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"helm.fluxcd.io/v1","kind":"HelmRelease","metadata":{"annotations":{"fluxcd.io/sync-checksum":"69091ad8e7fe97e3926e9d2256a4a...
API Version:  helm.fluxcd.io/v1
Kind:         HelmRelease
Metadata:
  Creation Timestamp:  2020-07-22T07:04:18Z
  Generation:          3
  Resource Version:    28654611
Spec:
  Chart:
    Name:        prometheus-operator
    Repository:  https://kubernetes-charts.storage.googleapis.com
    Version:     9.3.1
  Helm Version:  v3
  Release Name:  prometheus-operator

[--snip--]

Status:
  Conditions:
    Last Transition Time:   2020-07-22T07:04:49Z
    Last Update Time:       2020-08-12T04:25:10Z
    Message:                Chart fetch was successful for Helm release 'prometheus-operator' in 'monitoring'.
    Reason:                 ChartFetched
    Status:                 True
    Type:                   ChartFetched
    Last Transition Time:   2020-08-09T22:21:57Z
    Last Update Time:       2020-08-12T04:25:50Z
    Message:                Release was successful for Helm release 'prometheus-operator' in 'monitoring'.
    Reason:                 Succeeded
    Status:                 True
    Type:                   Released
  Last Attempted Revision:  9.1.1
  Observed Generation:      3
  Phase:                    Succeeded
  Release Name:             prometheus-operator
  Release Status:           deployed
  Revision:                 9.3.1
Events:                     <none>

Cluster 2

gitlab continually upgrading;

$ helm ls -A
NAME                            NAMESPACE               REVISION        UPDATED                                 STATUS          CHART                                   APP VERSION
flux                            fluxcd                  29              2020-07-31 11:11:32.259345348 +0000 UTC deployed        flux-1.4.0                              1.20.0
gitlab                          gitlab                  2291            2020-08-17 01:24:42.888677627 +0000 UTC deployed        gitlab-4.2.4                            13.2.4
helm-operator                   fluxcd                  29              2020-08-10 11:23:56.7837415 +1000 AEST  deployed        helm-operator-1.2.0                     1.2.0
$ helm -n gitlab history gitlab
REVISION        UPDATED                         STATUS          CHART           APP VERSION     DESCRIPTION
2283            Mon Aug 17 01:00:58 2020        superseded      gitlab-4.2.4    13.2.4          Upgrade complete
2284            Mon Aug 17 01:03:42 2020        superseded      gitlab-4.2.4    13.2.4          Upgrade complete
2285            Mon Aug 17 01:06:42 2020        superseded      gitlab-4.2.4    13.2.4          Upgrade complete
2286            Mon Aug 17 01:09:43 2020        superseded      gitlab-4.2.4    13.2.4          Upgrade complete
2287            Mon Aug 17 01:12:33 2020        superseded      gitlab-4.2.4    13.2.4          Upgrade complete
2288            Mon Aug 17 01:15:38 2020        superseded      gitlab-4.2.4    13.2.4          Upgrade complete
2289            Mon Aug 17 01:18:36 2020        superseded      gitlab-4.2.4    13.2.4          Upgrade complete
2290            Mon Aug 17 01:21:41 2020        superseded      gitlab-4.2.4    13.2.4          Upgrade complete
2291            Mon Aug 17 01:24:42 2020        deployed        gitlab-4.2.4    13.2.4          Upgrade complete
2292            Mon Aug 17 01:27:37 2020        pending-upgrade gitlab-4.2.4    13.2.4          Preparing upgrade
$ k -n gitlab describe hr gitlab
Name:         gitlab
Namespace:    gitlab
Labels:       fluxcd.io/sync-gc-mark=sha256.tUcmp_UAo3ET0QtAeI2CG0-lgb2ZYTRDRYWgENMW2xI
Annotations:  fluxcd.io/sync-checksum: 627e765ba04176a6940be115cb3d686eb4b965f3
              kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"helm.fluxcd.io/v1","kind":"HelmRelease","metadata":{"annotations":{"fluxcd.io/sync-checksum":"627e765ba04176a6940be115cb3d6...
API Version:  helm.fluxcd.io/v1
Kind:         HelmRelease
Metadata:
  Creation Timestamp:  2020-07-07T07:57:21Z
  Generation:          15
  Resource Version:    36550979
Spec:
  Chart:
    Name:        gitlab
    Repository:  https://charts.gitlab.io
    Version:     4.2.4
  Helm Version:  v3
  Release Name:  gitlab

[--snip--]

Status:
  Conditions:
    Last Transition Time:   2020-07-31T11:13:15Z
    Last Update Time:       2020-08-17T01:25:22Z
    Message:                Release was successful for Helm release 'gitlab' in 'gitlab'.
    Reason:                 Succeeded
    Status:                 True
    Type:                   Released
    Last Transition Time:   2020-07-10T00:41:46Z
    Last Update Time:       2020-08-17T01:27:23Z
    Message:                Chart fetch was successful for Helm release 'gitlab' in 'gitlab'.
    Reason:                 ChartFetched
    Status:                 True
    Type:                   ChartFetched
  Last Attempted Revision:  4.1.4
  Observed Generation:      15
  Phase:                    ChartFetched
  Release Name:             gitlab
  Release Status:           pending-upgrade
  Revision:                 4.2.4
Events:                     <none>

davidholsgrove avatar Aug 17 '20 23:08 davidholsgrove

Hi there. I reported https://github.com/fluxcd/helm-operator/issues/469 in 1.1.0 but we are now running the same issue for all releases in 1.2.0, just like most of people here reported. Rolling back to 1.0.1 :(

gmaiztegi avatar Aug 18 '20 08:08 gmaiztegi

The problem seems to be that the LastAttemptedRevision is not set to the right version (but stuck on an older version), which is later used to determine if the release needs to be upgraded. This gives me sufficient information to work on a fix, but KubeCon is in the way today.

I will try to have a prerelease ready for you by tomorrow.

hiddeco avatar Aug 18 '20 08:08 hiddeco

Still unsuccessful in replicating the issue where the LastAttemptedRevision is not updated, even when I try to replicate it by jumping from 1.0.1 -> 1.1.0 -> 1.2.0 (while performing version upgrades for the HelmRelease in the meantime).

Given you all seem to have installed the helm-operator using Helm itself, can you please provide me with the output of kubectl get crd helmreleases.helm.fluxcd.io -o yaml, as I have a suspicion it may be due to Helm not performing upgrades for CRDs while a field has been added to the status field (since >=1.1.0).

(Another option would be to kubectl apply -f https://raw.githubusercontent.com/fluxcd/helm-operator/v1.2.0/deploy/crds.yaml, and see if the problem goes away).

hiddeco avatar Aug 18 '20 17:08 hiddeco

Thanks @hiddeco - looks like it was the version of the CRD wasn't updated and caused the run away helm-operator upgrades.

The HelmOperator chart option createCRD=true - is that the "right" way to go when we have HelmOperator managing the HelmRelease for HelmOperator?

Previously I had a (stale) version of the CRD in the git repo my FluxCD enforces. Would be good if HelmOperator had an initcontainer or other check and refused to start if its CRD was the wrong version maybe?

davidholsgrove avatar Aug 19 '20 05:08 davidholsgrove

The HelmOperator chart option createCRD=true - is that the "right" way to go when we have HelmOperator managing the HelmRelease for HelmOperator?

The right way is to not install the CRDs using the Helm chart, but apply it manually / synchronize it using Flux (as written out in the install instructions).

Previously I had a (stale) version of the CRD in the git repo my FluxCD enforces. Would be good if HelmOperator had an initcontainer or other check and refused to start if its CRD was the wrong version maybe?

If possible, that would likely be an improvement, but I do not think it will be implemented at this time (or in the near future) as we are working on a next-gen helm-controller that will eventually replace the Helm operator.

hiddeco avatar Aug 19 '20 09:08 hiddeco

@hiddeco we've been having this issue on v1.1.0 with the correct CRD managed by Flux. Are you saying that this issue affects v1.1.0 and v1.2.0 if the CRD isn't updated?

stevehipwell avatar Aug 19 '20 12:08 stevehipwell

After applying the CRD for v1.2.0, I am happy to report helm operator v1.2.0 is no longer doing unwarranted releases. I think this issue could be closed.

Thanks @hiddeco !

onedr0p avatar Aug 19 '20 12:08 onedr0p

Just a friendly reminder that Helm is not suitable for managing the lifecycle of Kubernetes CRD controllers. CRDs have to be extracted from charts and applied on the cluster with Flux as plain YAMLs, otherwise the controller version will diverge from its API and that can break production in various ways.

stefanprodan avatar Aug 19 '20 13:08 stefanprodan

@stefanprodan can you elaborate on how CRD should be handled? I use cert-manager (official helm chart) and the prerequisite is to install their CRD first, so basically the CRD is out of the chart and not part of the HelmRelease. Thanks

talmarco avatar Aug 19 '20 13:08 talmarco

@talmarco it is described here, however these notes should be for upgrading too, not only install

https://github.com/fluxcd/helm-operator/tree/master/chart/helm-operator#installation

onedr0p avatar Aug 19 '20 13:08 onedr0p

@onedr0p I already have helm-operator CRD installed. My question was how to handle other CRDs (like cert-manager) as I understand from @stefanprodan's answer this was the root cause for constantly upgrading the charts.

talmarco avatar Aug 19 '20 13:08 talmarco

When upgrading cert manager you also need to manually apply the crds or commit them to your repo for flux to apply. Same goes for here.

onedr0p avatar Aug 19 '20 13:08 onedr0p

@stevehipwell 1.1.0 has a bug which was fixed in 1.2.0, which requires an update of the CRD.

hiddeco avatar Aug 19 '20 13:08 hiddeco

Thanks @hiddeco, that was what I thought and what we've seen when testing the v1.2.0 release.

Just a friendly reminder that Helm is not suitable for managing the lifecycle of Kubernetes CRD controllers. CRDs have to be extracted from charts and applied on the cluster with Flux as plain YAMLs, otherwise the controller version will diverge from its API and that can break production in various ways.

@stefanprodan related to the above statement could you confirm that skipCRDs is true by default for the HelmRelease custom resources (the docs don't give the default values)? If not I'd be interested to know why? We've manually turned off all in chart CRD values (looking at you cert-manager) and set skipCRDs to true so that Flux can manage our CRDs; but that was based on our own understanding and not any actual documentation when we made this decision (about the time of the v1.1.0 release of the helm-operator).

stevehipwell avatar Aug 19 '20 13:08 stevehipwell

@stevehipwell skipCRDs has no effect on upgrades as Helm ignores the crds dir, it only works at install time but only if the CRDs are not already applied on the cluster. If you keep the CRDs in Git, Flux will apply them before the HelmReleases, so skipCRDs is not relevant.

stefanprodan avatar Aug 19 '20 13:08 stefanprodan

@stefanprodan I get the install only path, but it would be safest if you had to manually opt in to installing CRDs rather than that being the default behaviour.

stevehipwell avatar Aug 19 '20 14:08 stevehipwell

Changing the skipCRDs default to true is not an option as HelmRelease API is at v1 and that would be a breaking change that requires a major version bump. We are working on HelmRelease v2 as part of the GitOps Toolkit, such a change could make it in v2. Please start a discussion in the toolkit repo and we can discuss it.

stefanprodan avatar Aug 19 '20 14:08 stefanprodan

Thanks @stefanprodan I will do. Although, out of interest and not wanting to sound like I'm accusing anyone of anything as I'm honestly just interested, weren't the v1.2.0 CRD changes breaking?

stevehipwell avatar Aug 19 '20 14:08 stevehipwell