kustomize-controller icon indicating copy to clipboard operation
kustomize-controller copied to clipboard

"wait: true" doesn't wait and throws event "ReconciliationSucceeded" before resources are deployed

Open alexander-matthiesen opened this issue 2 years ago • 13 comments

We are currently migrating out services to GitOps with FluxCD. In this process we are quite struggeling with a few things but one of the main blockers for us is currently the "not waiting" of the kustomize-controller.

We have a multi-tenant setup and all of our microservices have an application repository and an environment repository. The application repository contains source code, tests and documentation. The environment repository contains the environment configuration for different stages (with differentiation by folders and not by branches).

Regarding our workflow and to give some context why we need the "wait":

  1. The developer pushes into the application repository branch
  2. DockerImage, HelmChart and other artifacts are build in the app repo
  3. A GitLab job is writing the current version in the env repo in the folder "stages/dev"
  4. (Flux will monitor that change and deploy the application)
  5. A custom controller (pipeline-trigger) will watch the tenant-kustomization and is looking for "ReconciliationSucceeded"
  6. The pipeline-trigger will trigger the application repository and execute e2e tests against the deployed application
  7. The flow will continue with step 3 but for stages "INT, QA and PROD"

Currently the issue is, that step 6 is directly executed after the GitRepository has triggered the reconciliation.

So the questions for me are:

  • Do we monitor the wrong events?
  • Is FluxCD not waiting at all?
  • What are the best configs for the interval settings together with "wait"?
  • Is there a better way to wait for a deployment and trigger a GitLab Pipeline after a successful deployment?

Thanks in advance!

alexander-matthiesen avatar May 17 '22 06:05 alexander-matthiesen

@alexander-matthiesen can you check for healthy ? We had a short discussion at KubeCon with @stefanprodan about this

haarchri avatar May 20 '22 09:05 haarchri

@haarchri you mean the event "healthy"? I will test a few things and come back here.

alexander-matthiesen avatar May 20 '22 09:05 alexander-matthiesen

Yes there is a dedicated condition in the status named Healthy and an event is issued to reflect the state.

stefanprodan avatar May 22 '22 07:05 stefanprodan

Thanks for the fast reply!

So currently only the condition is set to Healthy but we are getting no event in the kustomization-controller. Only "ReconciliationSucceeded", "HealthCheckFailed" or "ReconciliationFailed".

Does the "Healthy" condition is also triggering an event?

In rare occurrences the kind "HelmChart" having the status "ArtifactFailed". Will this also be reflected in the kustomization-controller "HealchCheckFailed" or do we have to observe those events / alerts separately?

Current we are using the notification-controller functionality to determine when to trigger out post-deploy-pipeline (which should also notify the developers when something bad happens).

alexander-matthiesen avatar May 23 '22 08:05 alexander-matthiesen

You can see the logic here https://github.com/fluxcd/kustomize-controller/blob/main/controllers/kustomization_controller.go#L886

To avoid spam, we only send a healthy event if the previous reconciliation had HealthCheckFailed or if the source revision is newer

stefanprodan avatar May 23 '22 09:05 stefanprodan

Which Version of Flux is installed in this Clusters ?

haarchri avatar May 23 '22 09:05 haarchri

Ah, the last part is important "If the source revision is newer". Thats good to hear.

We are using the version 0.24.0 of fluxcd in our "intermediate" cluster.

alexander-matthiesen avatar May 23 '22 09:05 alexander-matthiesen

@stefanprodan , is it really an event with the reason "Healthy" or is it the condition that is set? Currently we are monitoring only the reason of the events and not the condition of certain controllers / resources.

alexander-matthiesen avatar Jun 01 '22 12:06 alexander-matthiesen

The reason for the health check passed event is not "Healthy" but "Progressing".

(I've changed the received message and used logfmt)

2022-06-01T18:50:55.181570300Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=service-alpha-api-dev reason=Progressing message=Health check passed in 30.0546639s

So I would expect either "HealthCheckPassed" (corresponding to "HealthCheckFailed") or "Healthy" like the documentation said.

Could someone verify my investigation or is it total nonsense what I'm telling?

Here is the full log:

2022-06-01T18:30:06.940101700Z  * Environment: production
2022-06-01T18:30:06.940184900Z    WARNING: This is a development server. Do not use it in a production deployment.
2022-06-01T18:30:06.940202900Z    Use a production WSGI server instead.
2022-06-01T18:30:06.940222200Z  * Debug mode: off
2022-06-01T18:32:45.589147400Z INFO:app:controller=source-controller namespace=flux-system kind=GitRepository name=flux-system reason=NewArtifact message=stored artifact for commit 'Testing'
2022-06-01T18:32:45.627826900Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=tenants reason=DependencyNotReady message=Dependencies do not meet ready condition, retrying in 30s
2022-06-01T18:32:46.301167000Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=infrastructure reason=ReconciliationSucceeded message=Reconciliation finished in 690.127ms, next run in 30m0s
2022-06-01T18:32:46.610970800Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=flux-system reason=ReconciliationSucceeded message=Reconciliation finished in 998.5614ms, next run in 10m0s
2022-06-01T18:33:15.653958700Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=tenants reason=ReconciliationSucceeded message=Reconciliation finished in 62.0026ms, next run in 30m0s
2022-06-01T18:35:25.794200000Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=service-alpha-api-dev reason=HealthCheckFailed message=Health check failed after 5m0.0113057s, timeout waiting for: [HelmRelease/service-alpha-api-dev/podinfo status: 'InProgress']
2022-06-01T18:35:25.864155900Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=service-alpha-api-dev reason=Progressing message=HelmRelease/service-alpha-api-dev/podinfo configured
2022-06-01T18:37:12.543642900Z INFO:app:controller=source-controller namespace=flux-system kind=GitRepository name=flux-system reason=NewArtifact message=stored artifact for commit 'Testing'
2022-06-01T18:37:12.581935900Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=tenants reason=DependencyNotReady message=Dependencies do not meet ready condition, retrying in 30s
2022-06-01T18:37:13.232386200Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=infrastructure reason=ReconciliationSucceeded message=Reconciliation finished in 665.782ms, next run in 30m0s
2022-06-01T18:37:13.510053700Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=flux-system reason=ReconciliationSucceeded message=Reconciliation finished in 943.9189ms, next run in 10m0s
2022-06-01T18:37:42.628933700Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=tenants reason=ReconciliationSucceeded message=Reconciliation finished in 77.2732ms, next run in 30m0s
2022-06-01T18:38:05.738422400Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=flux-system reason=ReconciliationSucceeded message=Reconciliation finished in 716.4439ms, next run in 10m0s
2022-06-01T18:40:25.551753800Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=service-alpha-api-dev reason=HealthCheckFailed message=Health check failed after 5m0.0139593s, timeout waiting for: [HelmRelease/service-alpha-api-dev/podinfo status: 'InProgress']
2022-06-01T18:42:15.753100800Z INFO:app:controller=source-controller namespace=flux-system kind=GitRepository name=flux-system reason=NewArtifact message=stored artifact for commit 'Testing'
2022-06-01T18:42:15.825794800Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=tenants reason=DependencyNotReady message=Dependencies do not meet ready condition, retrying in 30s
2022-06-01T18:42:16.352460700Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=infrastructure reason=ReconciliationSucceeded message=Reconciliation finished in 557.7511ms, next run in 30m0s
2022-06-01T18:42:16.618117000Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=flux-system reason=ReconciliationSucceeded message=Reconciliation finished in 821.3467ms, next run in 10m0s
2022-06-01T18:42:45.838967300Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=tenants reason=ReconciliationSucceeded message=Reconciliation finished in 59.6609ms, next run in 30m0s
2022-06-01T18:45:18.076435800Z INFO:app:controller=source-controller namespace=flux-system kind=GitRepository name=flux-system reason=NewArtifact message=stored artifact for commit 'Testing'
2022-06-01T18:45:18.128160900Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=tenants reason=DependencyNotReady message=Dependencies do not meet ready condition, retrying in 30s
2022-06-01T18:45:18.763593600Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=infrastructure reason=ReconciliationSucceeded message=Reconciliation finished in 659.4845ms, next run in 30m0s
2022-06-01T18:45:18.995367200Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=flux-system reason=ReconciliationSucceeded message=Reconciliation finished in 890.3748ms, next run in 10m0s
2022-06-01T18:45:25.325476000Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=service-alpha-api-dev reason=HealthCheckFailed message=Health check failed after 5m0.0131973s, timeout waiting for: [HelmRelease/service-alpha-api-dev/podinfo status: 'InProgress']
2022-06-01T18:45:25.390171500Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=service-alpha-api-dev reason=Progressing message=HelmRelease/service-alpha-api-dev/podinfo configured
2022-06-01T18:45:48.158143600Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=tenants reason=ReconciliationSucceeded message=Reconciliation finished in 59.5706ms, next run in 30m0s
2022-06-01T18:47:20.040607900Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=infrastructure reason=ReconciliationSucceeded message=Reconciliation finished in 599.7491ms, next run in 30m0s
2022-06-01T18:47:20.302696500Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=flux-system reason=ReconciliationSucceeded message=Reconciliation finished in 865.9028ms, next run in 10m0s
2022-06-01T18:47:49.496037300Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=tenants reason=ReconciliationSucceeded message=Reconciliation finished in 75.063ms, next run in 30m0s
2022-06-01T18:48:05.791270900Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=flux-system reason=ReconciliationSucceeded message=Reconciliation finished in 729.7683ms, next run in 10m0s
2022-06-01T18:48:44.543229600Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=infrastructure reason=ReconciliationSucceeded message=Reconciliation finished in 386.725ms, next run in 30m0s
2022-06-01T18:50:25.086405700Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=service-alpha-api-dev reason=HealthCheckFailed message=Health check failed after 5m0.0163686s, timeout waiting for: [HelmRelease/service-alpha-api-dev/podinfo status: 'InProgress']
2022-06-01T18:50:25.159989900Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=service-alpha-api-dev reason=Progressing message=HelmRelease/service-alpha-api-dev/podinfo configured
2022-06-01T18:50:26.063824400Z INFO:app:controller=source-controller namespace=flux-system kind=HelmChart name=service-alpha-api-dev-podinfo reason=ChartPullSucceeded message=pulled 'podinfo' chart with version '5.2.1'
2022-06-01T18:50:55.181570300Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=service-alpha-api-dev reason=Progressing message=Health check passed in 30.0546639s
2022-06-01T18:50:55.227597100Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=service-alpha-api-dev reason=ReconciliationSucceeded message=Reconciliation finished in 30.169607s, next run in 30m0s
2022-06-01T18:54:03.514402600Z INFO:app:controller=kustomize-controller namespace=flux-system kind=Kustomization name=service-alpha-api-dev reason=ReconciliationSucceeded message=Reconciliation finished in 92.2743ms, next run in 30m0s

alexander-matthiesen avatar Jun 01 '22 18:06 alexander-matthiesen

So I would expect either "HealthCheckPassed" (corresponding to "HealthCheckFailed") or "Healthy" like the documentation said.

The documentation states that there is a Healthy condition, I think you are confusing conditions with reasons.

The reason send with each event is the one belonging to the Ready condition. After a successful health check, other things follow, that's why is still Progressing, but at the end it will issue an event using the ReconciliationSucceeded reason.

It goes like this:

  • if there is something to apply reason=Progressing message=object configured
  • if something was applied and it becomes healthy reason=Progressing message=Health check passed
  • if no errors occurred, no matter if something was applied or not, health checked or not reason=ReconciliationSucceeded message=Reconciliation finished

To detect the last event that you care about, you can filter by reason=ReconciliationSucceeded.

stefanprodan avatar Jun 01 '22 19:06 stefanprodan

Okay, yes I got it, that there are reasons and conditions.

Unfortunately it seems that the ReconciliationSucceeded event is also thrown when there is no new revision but the kustomization-controller has successfully reconciled (So every interval that you set in the kustomization resources). Is that true?

So since there is a HealthCheckFailed reason, is there any disadvantage to introduce a HealthCheckPassed reason which is thrown when the condition changed to "Healthy" again? In other words: Which is thrown when the revision changed and the HealthCheck actually passed?

This would enable us to set "wait: true" and just monitoring the "Reason"-Field and check if it was HealthCheckFailed or HealthCheckPassed.

alexander-matthiesen avatar Jun 02 '22 06:06 alexander-matthiesen

@alexander-matthiesen you could filter for messages beginning with Health check passed

stefanprodan avatar Jun 02 '22 06:06 stefanprodan

I just came across this thread as we are using Flux to dynamically create preview namespaces from kustomizations. We set the version for the image using variable substitution. Now we would like to be notified when a new environment is successfully created. According to this thread and L859, that doesn't seem to be possible since we don't create a new revision in the source with variable substitution.

Is there any way to be notified in these cases?

sch1ldkr0ete avatar Dec 08 '22 21:12 sch1ldkr0ete