flux2 icon indicating copy to clipboard operation
flux2 copied to clipboard

Feature Request: Kustomization logging to ignore `unchanged`

Open andrew-pickin-epi opened this issue 1 year ago • 2 comments

Describe the bug

Consider the following log:

{"level":"info","ts":"2023-07-04T15:44:34.049Z","msg":"server-side apply for cluster definitions completed","controller":"kustomization","controllerGroup":"kustomize.toolkit.fluxcd.io","controllerKind":"Kustomization","Kustomization":{"name":"eks-cluster-konf","namespace":"flux-system"},"namespace":"flux-system","name":"eks-cluster-konf","reconcileID":"9a5d01e7-2917-4450-a72b-0940f8e9c6de","output":{"Namespace/cbd-production":"unchanged","Namespace/cbd-testing":"unchanged","Namespace/registry-dev":"unchanged","Namespace/registry-prod":"unchanged","Namespace/registry-staging":"unchanged","Namespace/rn-prod":"unchanged","Namespace/rp-production":"unchanged","Namespace/rp-testing":"unchanged","Namespace/rp-uat":"unchanged","Namespace/rpa-dev":"unchanged","Namespace/uv-dev":"unchanged"}}

If this is parsed into separate field the output produces:

key value
output.Namespace/cbd-production unchanged
output.Namespace/cbd-testing unchanged
output.Namespace/registry-dev unchanged
output.Namespace/registry-prod unchanged
output.Namespace/registry-staging unchanged
output.Namespace/rn-prod unchanged
output.Namespace/rp-production unchanged
output.Namespace/rp-testing unchanged
output.Namespace/rp-uat unchanged
output.Namespace/rpa-dev unchanged
output.Namespace/uv-dev unchanged

This is only a small example in production environments this can produce hundreds of fields. I'm writing these into Elasticsearch then the number of fields gets large and can produce errors.

It would be better to only log the elements that have been updated.

Steps to reproduce

Standard behaviour.

Expected behavior

Preferred to drop "field":"unchanged", and only log elements updated.

Screenshots and recordings

n/a

OS / Distro

AWS EKS ghcr.io/fluxcd/kustomize-controller:v0.32.0

Flux version

flux version 0.38.2

Flux check

$ flux check ► checking prerequisites ✗ flux 0.38.2 <2.0.0-rc.5 (new version is available, please upgrade) ✔ Kubernetes 1.24.14-eks-c12679a >=1.20.6-0 ► checking controllers ✔ helm-controller: deployment ready ► ghcr.io/fluxcd/helm-controller:v0.28.1 ✔ image-automation-controller: deployment ready ► ghcr.io/fluxcd/image-automation-controller:v0.28.0 ✔ image-reflector-controller: deployment ready ► ghcr.io/fluxcd/image-reflector-controller:v0.23.1 ✔ kustomize-controller: deployment ready ► ghcr.io/fluxcd/kustomize-controller:v0.32.0 ✔ notification-controller: deployment ready ► ghcr.io/fluxcd/notification-controller:v0.30.2 ✔ source-controller: deployment ready ► ghcr.io/fluxcd/source-controller:v0.33.0 ► checking crds ✔ alerts.notification.toolkit.fluxcd.io/v1beta1 ✔ buckets.source.toolkit.fluxcd.io/v1beta1 ✔ gitrepositories.source.toolkit.fluxcd.io/v1beta1 ✔ helmcharts.source.toolkit.fluxcd.io/v1beta1 ✔ helmreleases.helm.toolkit.fluxcd.io/v2beta1 ✔ helmrepositories.source.toolkit.fluxcd.io/v1beta1 ✔ imagepolicies.image.toolkit.fluxcd.io/v1beta1 ✔ imagerepositories.image.toolkit.fluxcd.io/v1beta1 ✔ imageupdateautomations.image.toolkit.fluxcd.io/v1beta1 ✔ kustomizations.kustomize.toolkit.fluxcd.io/v1beta2 ✔ ocirepositories.source.toolkit.fluxcd.io/v1beta2 ✔ providers.notification.toolkit.fluxcd.io/v1beta1 ✔ receivers.notification.toolkit.fluxcd.io/v1beta1 ✔ all checks passed

Git provider

github

Container Registry provider

Github

Additional context

No response

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

andrew-pickin-epi avatar Jul 04 '23 16:07 andrew-pickin-epi

There is a change in 2.0.1 to exclude "skipped" resources, I think that's not the same as what you're asking about. Those resources are skipped because they have a reconcile: disabled annotation, and it was triggering spurious messages for resources that were not updated. These messages are actually accurate, they just don't tell you about a change so it would be nice to exclude them.

I know we try to avoid knobs that can affect the global behavior of Flux, so I wonder if there's some reason we need these in the default output, or if they could just be excluded from the logs altogether?

It's not an API surface, so I think we should be able to make changes here if they make sense (even though we've already incremented the major version) - these messages where nothing happens could be moved to debug log, say, but I don't know what other services might have a dependency on the information about what isn't happening, to render it visually.

(Could this information be used somehow in Weave GitOps UI, or does it read everything it needs from the Kustomization inventory?)

The logs and the events are typically bound together, so I'm not sure if this change might have unanticipated consequences.

kingdonb avatar Jul 12 '23 12:07 kingdonb

This issue also affects us. We have hundreds of Kustomization objects managing thousands of Kubernetes resources and because the field names in the JSON log output are the namespace+name of the objects under management, this means the schema for the logs has an unbounded list of fields.

Our Elasticsearch indices that store flux-system logs were set to 32,000 fields (instead of the default 1,000) to be able to index the logs without dropping them and we've just hit that limit and will have to increase it again.

While not logging unchanged objects would be one approach, the principal issue is the unbounded cardinality of having the field name be the name of the Kubernetes object.

How about instead of the current output structure of this:

{
  "output": {
    "Namespace/cbd-production": "unchanged",
    "Namespace/cbd-testing": "configured",
    "Namespace/registry-dev": "configured",
    "Namespace/registry-prod": "unchanged",
    "Namespace/registry-staging": "unchanged",
    "HelmRelease/cbd-testing/my-app": "configured",
    "HelmRelease/cbd-production/my-app": "unchanged"
  }
}

We break into into possible actions (I have found "unchanged" and "configured" in our logs but there may be more) with the values of each of those being a list of strings of the object names:

{
  "output": {
    "unchanged": [
      "Namespace/cbd-production",
      "Namespace/registry-prod",
      "Namespace/registry-staging",
      "HelmRelease/cbd-production/my-app"
    ],
    "configured": [
      "Namespace/cbd-testing",
      "Namespace/registry-dev",
      "HelmRelease/cbd-testing/my-app"
    ]
  }
}

This solves the cardinality issue though it might still produce large log lines so maybe logging the individual objects which are unchanged could be optional or demoted to debug/info?

andyspiers avatar May 10 '24 11:05 andyspiers