helm-controller
helm-controller copied to clipboard
Metallb installation with `driftDetection: mode: enabled` failed to apply revision
I'm trying to setup Metallb with this Kustomization:
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: apps-metallb
namespace: flux-system
spec:
path: /apps/metallb-system/metallb/app
sourceRef:
kind: GitRepository
name: apps
healthChecks:
- apiVersion: helm.toolkit.fluxcd.io/v2beta2
kind: HelmRelease
name: metallb
namespace: metallb-system
interval: 30m
retryInterval: 1m
timeout: 3m
And this Helm Release:
---
apiVersion: helm.toolkit.fluxcd.io/v2beta2
kind: HelmRelease
metadata:
name: metallb
spec:
interval: 15m
driftDetection:
mode: enabled
chart:
spec:
chart: metallb
version: "0.13.12"
sourceRef:
kind: HelmRepository
name: metallb-charts
namespace: flux-system
maxHistory: 3
install:
createNamespace: true
crds: CreateReplace
remediation:
retries: 3
upgrade:
cleanupOnFail: true
crds: CreateReplace
remediation:
retries: 3
uninstall:
keepHistory: false
values:
controller:
logLevel: warn
speaker:
logLevel: warn
frr:
enabled: false
flux version
flux: v2.2.0
distribution: flux-v2.2.1
helm-controller: v0.37.1
image-automation-controller: v0.37.0
image-reflector-controller: v0.31.1
kustomize-controller: v1.2.1
notification-controller: v1.2.3
source-controller: v1.2.3
flux get kustomizations
is showing it is never Ready and marked as Unkown.
In the logs of the helm-controller I have
k -n flux-system logs helm-controller-<id>
{"level":"debug","ts":"2023-12-18T20:53:52.682Z","logger":"events","msg":"Cluster state of release metallb-system/metallb.v1 has drifted from the desired state:\nCustomResourceDefinition/addresspools.metallb.io changed (0 additions, 1 changes, 0 removals)\nCustomResourceDefinition/bgppeers.metallb.io changed (0 additions, 1 changes, 0 removals)","type":"Warning","object":{"kind":"HelmRelease","namespace":"metallb-system","name":"metallb","uid":"59eb65b4-d800-4f6e-96af-59891565efc6","apiVersion":"helm.toolkit.fluxcd.io/v2beta2","resourceVersion":"181220417"},"reason":"DriftDetected"}
{"level":"debug","ts":"2023-12-18T20:53:52.683Z","msg":"instructed to stop before running drift correction action reconciler correct cluster drift","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"metallb","namespace":"metallb-system"},"namespace":"metallb-system","name":"metallb","reconcileID":"8cc34888-d956-4d75-93d4-87c10f99a24e"}
The application is successfully installed, the pods are Ready, the HelmRelease is marked as Ready. However, the Kustomization never finish. It continuously try to reconcile the HelmRelease for some reasons.
I tried many times to manually reconcile, tried with --with-source.
I tried to remove the HealthCheck and set it to wait: true
, nothing is working.
The only way to make it work is to remove every HealthCheck or wait:true statement and it is then successfully deployed.
Can you please share the .status
and the events for the HelmRelease object? It appears to me like the controller is observing continued drift for the release, and you should e.g. make use of ignore rules to exclude certain fields.
The precise fields can be observable from the controllers logs, they should be logged as resource modified
messages at debug level with a patch
field attached to them.
Please see the .status of the HR
status:
conditions:
- lastTransitionTime: "2023-12-19T11:22:43Z"
message: Helm install succeeded for release metallb-system/metallb.v1 with chart
[email protected]
observedGeneration: 4
reason: ProgressingWithRetry
status: "True"
type: Reconciling
- lastTransitionTime: "2023-12-18T20:11:17Z"
message: Helm install succeeded for release metallb-system/metallb.v1 with chart
[email protected]
observedGeneration: 1
reason: InstallSucceeded
status: "True"
type: Ready
- lastTransitionTime: "2023-12-18T20:11:17Z"
message: Helm install succeeded for release metallb-system/metallb.v1 with chart
[email protected]
observedGeneration: 1
reason: InstallSucceeded
status: "True"
type: Released
helmChart: flux-system/metallb-system-metallb
history:
- chartName: metallb
chartVersion: 0.13.12
configDigest: sha256:cabfeb21c57b8b06565689d2212cdfb278c61ce442822337215254a84a4850d9
digest: sha256:e524142b85ae05a16d30ba30962e2a175d6381995bc71d463a97794211a15c98
firstDeployed: "2023-12-18T20:11:04Z"
lastDeployed: "2023-12-18T20:11:04Z"
name: metallb
namespace: metallb-system
status: deployed
version: 1
lastAttemptedConfigDigest: sha256:cabfeb21c57b8b06565689d2212cdfb278c61ce442822337215254a84a4850d9
lastAttemptedGeneration: 4
lastAttemptedReleaseAction: install
lastAttemptedRevision: 0.13.12
lastHandledReconcileAt: "2023-12-18T21:53:52.159942829+01:00"
lastHandledResetAt: "2023-12-18T21:53:52.159942829+01:00"
observedGeneration: -1
storageNamespace: metallb-system
Trying to set the controller log to debug.
How can I extract this patch
from the controller logs ?
I can't see anything when I look a the debug logs.
It should be logged right after "detected changes in cluster state", see:
https://github.com/fluxcd/helm-controller/blob/main/internal/reconcile/atomic_release.go#L377-L387
Without knowing the specific path, you should at least be able to confirm the issue is indeed due to detected drift by excluding the resource in full.
This is what I found in my logs.
2023-12-20T00:05:26.960Z debug - Cluster state of release metallb-system/metallb.v1 has drifted from the desired state:
CustomResourceDefinition/addresspools.metallb.io changed (0 additions, 1 changes, 0 removals)
CustomResourceDefinition/bgppeers.metallb.io changed (0 additions, 1 changes, 0 removals)
2023-12-20T00:05:26.961Z debug HelmRelease/metallb.metallb-system - instructed to stop before running drift correction action reconciler correct cluster drift
2023-12-20T00:08:38.996Z info HelmRelease/metallb.metallb-system - HelmChart/flux-system/metallb-system-metallb with SourceRef 'HelmRepository/flux-system/metallb-charts' is in-sync
2023-12-20T00:08:39.041Z debug HelmRelease/metallb.metallb-system - determining current state of Helm release
2023-12-20T00:08:39.280Z debug HelmRelease/metallb.metallb-system - determining next Helm action based on current state
2023-12-20T00:08:39.280Z info HelmRelease/metallb.metallb-system - detected changes in cluster state: removed: 0, changed: 2, excluded: 0
2023-12-20T00:08:39.280Z debug HelmRelease/metallb.metallb-system - resource modified
2023-12-20T00:08:39.280Z debug HelmRelease/metallb.metallb-system - resource modified
2023-12-20T00:08:39.280Z debug - Cluster state of release metallb-system/metallb.v1 has drifted from the desired state:
CustomResourceDefinition/addresspools.metallb.io changed (0 additions, 1 changes, 0 removals)
CustomResourceDefinition/bgppeers.metallb.io changed (0 additions, 1 changes, 0 removals)
2023-12-20T00:08:39.296Z info HelmRelease/metallb.metallb-system - running 'correct cluster drift' action with timeout of 5m0s
2023-12-20T00:08:39.318Z debug - Cluster state of release metallb-system/metallb.v1 has been corrected:
CustomResourceDefinition/addresspools.metallb.io configured
CustomResourceDefinition/bgppeers.metallb.io configured
2023-12-20T00:08:39.319Z debug HelmRelease/metallb.metallb-system - determining current state of Helm release
2023-12-20T00:08:39.541Z debug HelmRelease/metallb.metallb-system - determining next Helm action based on current state
2023-12-20T00:08:39.541Z info HelmRelease/metallb.metallb-system - detected changes in cluster state: removed: 0, changed: 2, excluded: 0
2023-12-20T00:08:39.541Z debug HelmRelease/metallb.metallb-system - resource modified
2023-12-20T00:08:39.541Z debug HelmRelease/metallb.metallb-system - resource modified
2023-12-20T00:08:39.541Z debug - Cluster state of release metallb-system/metallb.v1 has drifted from the desired state:
CustomResourceDefinition/addresspools.metallb.io changed (0 additions, 1 changes, 0 removals)
CustomResourceDefinition/bgppeers.metallb.io changed (0 additions, 1 changes, 0 removals)
2023-12-20T00:08:39.542Z debug HelmRelease/metallb.metallb-system - instructed to stop before running drift correction action reconciler correct cluster drif
IIRC, I need to ignore both CRDs CustomResourceDefinition/addresspools.metallb.io CustomResourceDefinition/bgppeers.metallb.io
For some reasons these are changed after Helm installs it, right.
We are experiencing this problem as well. But on top of this, we also see the wrong status for the MetalLB HelmRelease (MetalLB is only an example here I guess).
We see dependency 'monitoring/xx' is not ready
as status like so:
status:
conditions:
- lastTransitionTime: "2024-01-12T11:10:18Z"
message: dependency 'monitoring/xx' is not ready
observedGeneration: 17
reason: ProgressingWithRetry
status: "True"
type: Reconciling
- lastTransitionTime: "2024-01-11T10:06:53Z"
message: dependency 'monitoring/xx' is not ready
observedGeneration: 3
reason: DependencyNotReady
status: "False"
type: Ready
- lastTransitionTime: "2024-01-09T16:54:05Z"
message: Helm install succeeded for release metallb/metallb.v1 with chart [email protected]
observedGeneration: 1
reason: InstallSucceeded
status: "True"
But in the helm-controller logs we see
{"level":"info","ts":"2024-01-12T11:03:47.753Z","msg":"checking 1 dependencies","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"metallb","namespace":"metallb"},"namespace":"metallb","name":"metallb","reconcileID":"d7751623-4737-417b-82cc-d1db35fdbfe7"}
{"level":"info","ts":"2024-01-12T11:03:47.753Z","msg":"all dependencies are ready","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"metallb","namespace":"metallb"},"namespace":"metallb","name":"metallb","reconcileID":"d7751623-4737-417b-82cc-d1db35fdbfe7"}
{"level":"info","ts":"2024-01-12T11:03:48.117Z","msg":"detected changes in cluster state: removed: 0, changed: 2, excluded: 0","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"metallb","namespace":"metallb"},"namespace":"metallb","name":"metallb","reconcileID":"d7751623-4737-417b-82cc-d1db35fdbfe7"}
{"level":"info","ts":"2024-01-12T11:03:48.163Z","msg":"running 'correct cluster drift' action with timeout of 5m0s","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"metallb","namespace":"metallb"},"namespace":"metallb","name":"metallb","reconcileID":"d7751623-4737-417b-82cc-d1db35fdbfe7"}
{"level":"info","ts":"2024-01-12T11:03:48.584Z","msg":"detected changes in cluster state: removed: 0, changed: 2, excluded: 0","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"metallb","namespace":"metallb"},"namespace":"metallb","name":"metallb","reconcileID":"d7751623-4737-417b-82cc-d1db35fdbfe7"}
--> "msg":"all dependencies are ready"
Setting the log-level to debug showed us the path for the (automatically) changed data. We added the following to the MetalLB HelmRelease and it fixed the reconciliation.
driftDetection:
ignore:
- paths:
- /spec/conversion/webhook/clientConfig/caBundle
target:
kind: CustomResourceDefinition
Still I think the status message has to be fixed...because it seems to not change, if the HelmRelease goes into "ProgressingWithRetry" - it just keeps the status message from before, is my guess (without looking into the code).
See also: https://github.com/metallb/metallb/issues/1681