source-controller
source-controller copied to clipboard
[bug] HelmRepository blocked if secret on startup not exists
Steps:
- install fluxcd
- add HelmRepository CRDs with secretRef
- wait till HelmRepo failed
- add Secret (which was referenced)
- or unseal by sealedsecet ...
- ....
Error Behavour:
- HelmRepository does not reconcile with new working secret
Expected Behavour:
- HelmRepository reconcile after given time / interval
Workaround:
- kill / restart source-controller pod
in fluxcd, version:
- 0.41.2
- 2.0.0-rc5
- 2.0.1
Does it happen with the ErrorHandling here?
https://github.com/fluxcd/source-controller/blob/7f40be76e90b2d4afe9f8f9d7f53ac719fe1205e/internal/controller/helmrepository_controller.go#L411-L416
On the GitRepository (where it works), there we god an "Generic" Error: https://github.com/fluxcd/source-controller/blob/7f40be76e90b2d4afe9f8f9d7f53ac719fe1205e/internal/controller/gitrepository_controller.go#L485-L491
maybe the old error lead them to block permenently
Hi, I just tried it but I couldn't reproduce it. I created the following helmrepo:
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
name: podinfo
namespace: default
spec:
interval: 1m
url: https://stefanprodan.github.io/podinfo
secretRef:
name: "example"
The secret doesn't exist yet. Got the following errors in the logs
{"level":"error","ts":"2023-07-21T16:04:40.646+0530","msg":"Reconciler error","controller":"helmrepository","controllerGroup":"source.toolkit.fluxcd.io","controllerKind":"HelmRepository","HelmRepository":{"name":"podinfo","namespace":"default"},"namespace":"default","name":"podinfo","reconcileID":"3276abd8-8a54-4057-8bc7-ab7664327a44","error":"failed to get secret 'default/example': secrets "example" not found"}
The status of helmrepo shows (kubectl get helmrepository podinfo -o yaml):
status:
conditions:
- lastTransitionTime: "2023-07-21T10:34:45Z"
message: building artifact
observedGeneration: 1
reason: ProgressingWithRetry
status: "True"
type: Reconciling
- lastTransitionTime: "2023-07-21T10:34:45Z"
message: 'failed to get secret ''default/example'': secrets "example" not found'
observedGeneration: 1
reason: AuthenticationFailed
status: "False"
type: Ready
- lastTransitionTime: "2023-07-21T10:34:40Z"
message: 'failed to get secret ''default/example'': secrets "example" not found'
observedGeneration: 1
reason: AuthenticationFailed
status: "True"
type: FetchFailed
observedGeneration: -1
After creating the secret, within a few seconds, the logs show
{"level":"info","ts":"2023-07-21T16:06:16.387+0530","msg":"stored fetched index of size 43.13kB from 'https://stefanprodan.github.io/podinfo'","controller":"helmrepository", "controllerGroup":"source.toolkit.fluxcd.io","controllerKind":"HelmRepository","HelmRepository":{"name":"podinfo","namespace":"default"},"namespace":"default","name":"podinf o","reconcileID":"96dcf686-9538-462e-b832-be6f1f873be5"}
and the helmrepo status shows:
status:
artifact:
digest: sha256:80b091a3a69b9ecfebde40ce2a5f19e95f8f8ea956bd5635a31701f7fad1616e
lastUpdateTime: "2023-07-21T10:36:16Z"
path: helmrepository/default/podinfo/index-80b091a3a69b9ecfebde40ce2a5f19e95f8f8ea956bd5635a31701f7fad1616e.yaml
revision: sha256:80b091a3a69b9ecfebde40ce2a5f19e95f8f8ea956bd5635a31701f7fad1616e
size: 43126
url: http://:0/helmrepository/default/podinfo/index-80b091a3a69b9ecfebde40ce2a5f19e95f8f8ea956bd5635a31701f7fad1616e.yaml
conditions:
- lastTransitionTime: "2023-07-21T10:36:16Z"
message: 'stored artifact: revision ''sha256:80b091a3a69b9ecfebde40ce2a5f19e95f8f8ea956bd5635a31701f7fad1616e'''
observedGeneration: 1
reason: Succeeded
status: "True"
type: Ready
- lastTransitionTime: "2023-07-21T10:36:16Z"
message: 'stored artifact: revision ''sha256:80b091a3a69b9ecfebde40ce2a5f19e95f8f8ea956bd5635a31701f7fad1616e'''
observedGeneration: 1
reason: Succeeded
status: "True"
type: ArtifactInStorage
observedGeneration: 1
...
An object can get blocked if they have a Stalled condition in the status, which we don't in this case.
Can you check the status of the blocked helmrepo and share?
@genofire when reporting bugs please say which version you're using by simply posting the flux check output.
► checking prerequisites
✔ Kubernetes 1.24.6 >=1.24.0-0
► checking controllers
✔ helm-controller: deployment ready
► ghcr.io/fluxcd/helm-controller:v0.34.1
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v1.0.0-rc.4
✔ notification-controller: deployment ready
► ghcr.io/fluxcd/notification-controller:v1.0.0-rc.4
✔ source-controller: deployment ready
► ghcr.io/fluxcd/source-controller:v1.0.0-rc.5
► checking crds
✔ alerts.notification.toolkit.fluxcd.io/v1beta2
✔ buckets.source.toolkit.fluxcd.io/v1beta2
✔ gitrepositories.source.toolkit.fluxcd.io/v1
✔ helmcharts.source.toolkit.fluxcd.io/v1beta2
✔ helmreleases.helm.toolkit.fluxcd.io/v2beta1
✔ helmrepositories.source.toolkit.fluxcd.io/v1beta2
✔ kustomizations.kustomize.toolkit.fluxcd.io/v1
✔ ocirepositories.source.toolkit.fluxcd.io/v1beta2
✔ providers.notification.toolkit.fluxcd.io/v1beta2
✔ receivers.notification.toolkit.fluxcd.io/v1
✔ all checks passed
That's the CLI version, what about controllers and CRDs? flux check prints those.
no i mean, that the namespace has the version-label of 2.0.0-rc5 - have edit / update the message
Can you please upgrade to Flux v2.0.1 and see if this issue persists?
That needs time -> we have 30 clusters with staging
Not asking you to upgrade all of them, just one to rerun the test. We've tried to replicate this with 2.0.1 and the HelmRepository is not getting stuck. Also what type of repo are you using? OCI or Helm HTTP?
It wold also be helpful if you can post here kubectl get helmrepository --show-managed-field -oyaml for the one that's stuck.
so secret exists for 31 minutes, now:
helmrepo:
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
annotations:
meta.helm.sh/release-name: infra-infra-base
meta.helm.sh/release-namespace: infra
creationTimestamp: "2023-07-21T12:46:01Z"
generation: 1
labels:
app.kubernetes.io/managed-by: Helm
helm.toolkit.fluxcd.io/name: infra-base
helm.toolkit.fluxcd.io/namespace: flux-system
managedFields:
- apiVersion: source.toolkit.fluxcd.io/v1beta2
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:meta.helm.sh/release-name: {}
f:meta.helm.sh/release-namespace: {}
f:labels:
.: {}
f:app.kubernetes.io/managed-by: {}
f:helm.toolkit.fluxcd.io/name: {}
f:helm.toolkit.fluxcd.io/namespace: {}
f:spec:
.: {}
f:interval: {}
f:provider: {}
f:secretRef:
.: {}
f:name: {}
f:timeout: {}
f:url: {}
manager: helm-controller
operation: Update
time: "2023-07-21T12:46:01Z"
- apiVersion: source.toolkit.fluxcd.io/v1beta2
fieldsType: FieldsV1
fieldsV1:
f:status:
f:conditions: {}
manager: source-controller
operation: Update
subresource: status
time: "2023-07-21T12:49:23Z"
name: opstree
namespace: infra
resourceVersion: "32531330664"
uid: 3021755c-d010-454f-8b88-fecf6ded654f
spec:
interval: 5m
provider: generic
secretRef:
name: internal-artifactory-auth
timeout: 60s
url: https://repo-ex.internal.de/artifactory/ot-container-kit-helm-remote/
status:
conditions:
- lastTransitionTime: "2023-07-21T12:49:23Z"
message: building artifact
observedGeneration: 1
reason: ProgressingWithRetry
status: "True"
type: Reconciling
- lastTransitionTime: "2023-07-21T12:49:23Z"
message: 'failed to get secret ''infra/internal-artifactory-auth'': secrets "internal-artifactory-auth"
not found'
observedGeneration: 1
reason: AuthenticationFailed
status: "False"
type: Ready
- lastTransitionTime: "2023-07-21T12:46:02Z"
message: 'failed to get secret ''infra/internal-artifactory-auth'': secrets "internal-artifactory-auth"
not found'
observedGeneration: 1
reason: AuthenticationFailed
status: "True"
type: FetchFailed
observedGeneration: -1
oci helmrepo:
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
annotations:
meta.helm.sh/release-name: infra-infra-base
meta.helm.sh/release-namespace: infra
creationTimestamp: "2023-07-21T12:46:01Z"
generation: 1
labels:
app.kubernetes.io/managed-by: Helm
helm.toolkit.fluxcd.io/name: infra-base
helm.toolkit.fluxcd.io/namespace: flux-system
managedFields:
- apiVersion: source.toolkit.fluxcd.io/v1beta2
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:meta.helm.sh/release-name: {}
f:meta.helm.sh/release-namespace: {}
f:labels:
.: {}
f:app.kubernetes.io/managed-by: {}
f:helm.toolkit.fluxcd.io/name: {}
f:helm.toolkit.fluxcd.io/namespace: {}
f:spec:
.: {}
f:interval: {}
f:provider: {}
f:secretRef:
.: {}
f:name: {}
f:timeout: {}
f:type: {}
f:url: {}
manager: helm-controller
operation: Update
time: "2023-07-21T12:46:01Z"
- apiVersion: source.toolkit.fluxcd.io/v1beta2
fieldsType: FieldsV1
fieldsV1:
f:status:
f:conditions: {}
manager: source-controller
operation: Update
subresource: status
time: "2023-07-21T12:49:23Z"
name: weave-gitops
namespace: infra
resourceVersion: "32531330680"
uid: 2a2a7e1b-a809-4992-9f76-a8c5d7650133
spec:
interval: 60m0s
provider: generic
secretRef:
name: internal-artifactory-auth
timeout: 60s
type: oci
url: oci://docker-virtual.repo-ex.internal.de/weaveworks/charts
status:
conditions:
- lastTransitionTime: "2023-07-21T12:49:22Z"
message: 'processing object: new generation -1 -> 1'
observedGeneration: 1
reason: ProgressingWithRetry
status: "True"
type: Reconciling
- lastTransitionTime: "2023-07-21T12:46:02Z"
message: 'failed to get secret ''infra/internal-artifactory-auth'': secrets "internal-artifactory-auth"
not found'
observedGeneration: 1
reason: AuthenticationFailed
status: "False"
type: Ready
observedGeneration: -1
If you run flux reconcile helmrepository does it find the secret or the same thing happens?
if i trigger it twice:
# flux reconcile source helm -n infra weave-gitops
► annotating HelmRepository weave-gitops in infra namespace
✔ HelmRepository annotated
◎ waiting for HelmRepository reconciliation
✗ HelmRepository reconciliation failed: 'failed to get secret 'infra/internal-artifactory-auth': secrets "internal-artifactory-auth" not found'
# flux reconcile source helm -n infra weave-gitops
► annotating HelmRepository weave-gitops in infra namespace
✔ HelmRepository annotated
◎ waiting for HelmRepository reconciliation
✔ Helm repository is ready
This is really strange, is your Kubernetes API under heavy load, is etcd having any issue? This may be a caching issue, we have disabled the caching of Secrets in our controllers but the API does it as well.
It is your cloud provider IONOS ... we have no control over the etcd. my problem ist, i do not see any logs above a reconcileing of this helmrepository (other i see) ... like it is in stall.
we have that problem daily over two month (always if we create a new cluster and install there your default resources)
if you are right, that the kube-api request is under heavy load, so maybe we should timeout request there (maybe that is the problem), here my code: https://github.com/fluxcd/pkg/pull/627