flux2
flux2 copied to clipboard
Feature Request: Kustomization logging to ignore `unchanged`
Describe the bug
Consider the following log:
{"level":"info","ts":"2023-07-04T15:44:34.049Z","msg":"server-side apply for cluster definitions completed","controller":"kustomization","controllerGroup":"kustomize.toolkit.fluxcd.io","controllerKind":"Kustomization","Kustomization":{"name":"eks-cluster-konf","namespace":"flux-system"},"namespace":"flux-system","name":"eks-cluster-konf","reconcileID":"9a5d01e7-2917-4450-a72b-0940f8e9c6de","output":{"Namespace/cbd-production":"unchanged","Namespace/cbd-testing":"unchanged","Namespace/registry-dev":"unchanged","Namespace/registry-prod":"unchanged","Namespace/registry-staging":"unchanged","Namespace/rn-prod":"unchanged","Namespace/rp-production":"unchanged","Namespace/rp-testing":"unchanged","Namespace/rp-uat":"unchanged","Namespace/rpa-dev":"unchanged","Namespace/uv-dev":"unchanged"}}
If this is parsed into separate field the output produces:
key | value |
---|---|
output.Namespace/cbd-production |
unchanged |
output.Namespace/cbd-testing |
unchanged |
output.Namespace/registry-dev |
unchanged |
output.Namespace/registry-prod |
unchanged |
output.Namespace/registry-staging |
unchanged |
output.Namespace/rn-prod |
unchanged |
output.Namespace/rp-production |
unchanged |
output.Namespace/rp-testing |
unchanged |
output.Namespace/rp-uat |
unchanged |
output.Namespace/rpa-dev |
unchanged |
output.Namespace/uv-dev |
unchanged |
This is only a small example in production environments this can produce hundreds of fields. I'm writing these into Elasticsearch then the number of fields gets large and can produce errors.
It would be better to only log the elements that have been updated.
Steps to reproduce
Standard behaviour.
Expected behavior
Preferred to drop "field":"unchanged"
, and only log elements updated.
Screenshots and recordings
n/a
OS / Distro
AWS EKS ghcr.io/fluxcd/kustomize-controller:v0.32.0
Flux version
flux version 0.38.2
Flux check
$ flux check ► checking prerequisites ✗ flux 0.38.2 <2.0.0-rc.5 (new version is available, please upgrade) ✔ Kubernetes 1.24.14-eks-c12679a >=1.20.6-0 ► checking controllers ✔ helm-controller: deployment ready ► ghcr.io/fluxcd/helm-controller:v0.28.1 ✔ image-automation-controller: deployment ready ► ghcr.io/fluxcd/image-automation-controller:v0.28.0 ✔ image-reflector-controller: deployment ready ► ghcr.io/fluxcd/image-reflector-controller:v0.23.1 ✔ kustomize-controller: deployment ready ► ghcr.io/fluxcd/kustomize-controller:v0.32.0 ✔ notification-controller: deployment ready ► ghcr.io/fluxcd/notification-controller:v0.30.2 ✔ source-controller: deployment ready ► ghcr.io/fluxcd/source-controller:v0.33.0 ► checking crds ✔ alerts.notification.toolkit.fluxcd.io/v1beta1 ✔ buckets.source.toolkit.fluxcd.io/v1beta1 ✔ gitrepositories.source.toolkit.fluxcd.io/v1beta1 ✔ helmcharts.source.toolkit.fluxcd.io/v1beta1 ✔ helmreleases.helm.toolkit.fluxcd.io/v2beta1 ✔ helmrepositories.source.toolkit.fluxcd.io/v1beta1 ✔ imagepolicies.image.toolkit.fluxcd.io/v1beta1 ✔ imagerepositories.image.toolkit.fluxcd.io/v1beta1 ✔ imageupdateautomations.image.toolkit.fluxcd.io/v1beta1 ✔ kustomizations.kustomize.toolkit.fluxcd.io/v1beta2 ✔ ocirepositories.source.toolkit.fluxcd.io/v1beta2 ✔ providers.notification.toolkit.fluxcd.io/v1beta1 ✔ receivers.notification.toolkit.fluxcd.io/v1beta1 ✔ all checks passed
Git provider
github
Container Registry provider
Github
Additional context
No response
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
There is a change in 2.0.1 to exclude "skipped" resources, I think that's not the same as what you're asking about. Those resources are skipped because they have a reconcile: disabled
annotation, and it was triggering spurious messages for resources that were not updated. These messages are actually accurate, they just don't tell you about a change so it would be nice to exclude them.
I know we try to avoid knobs that can affect the global behavior of Flux, so I wonder if there's some reason we need these in the default output, or if they could just be excluded from the logs altogether?
It's not an API surface, so I think we should be able to make changes here if they make sense (even though we've already incremented the major version) - these messages where nothing happens could be moved to debug log, say, but I don't know what other services might have a dependency on the information about what isn't happening, to render it visually.
(Could this information be used somehow in Weave GitOps UI, or does it read everything it needs from the Kustomization inventory?)
The logs and the events are typically bound together, so I'm not sure if this change might have unanticipated consequences.
This issue also affects us. We have hundreds of Kustomization objects managing thousands of Kubernetes resources and because the field names in the JSON log output are the namespace+name of the objects under management, this means the schema for the logs has an unbounded list of fields.
Our Elasticsearch indices that store flux-system logs were set to 32,000 fields (instead of the default 1,000) to be able to index the logs without dropping them and we've just hit that limit and will have to increase it again.
While not logging unchanged objects would be one approach, the principal issue is the unbounded cardinality of having the field name be the name of the Kubernetes object.
How about instead of the current output
structure of this:
{
"output": {
"Namespace/cbd-production": "unchanged",
"Namespace/cbd-testing": "configured",
"Namespace/registry-dev": "configured",
"Namespace/registry-prod": "unchanged",
"Namespace/registry-staging": "unchanged",
"HelmRelease/cbd-testing/my-app": "configured",
"HelmRelease/cbd-production/my-app": "unchanged"
}
}
We break into into possible actions (I have found "unchanged" and "configured" in our logs but there may be more) with the values of each of those being a list of strings of the object names:
{
"output": {
"unchanged": [
"Namespace/cbd-production",
"Namespace/registry-prod",
"Namespace/registry-staging",
"HelmRelease/cbd-production/my-app"
],
"configured": [
"Namespace/cbd-testing",
"Namespace/registry-dev",
"HelmRelease/cbd-testing/my-app"
]
}
}
This solves the cardinality issue though it might still produce large log lines so maybe logging the individual objects which are unchanged could be optional or demoted to debug/info?