flux2 icon indicating copy to clipboard operation
flux2 copied to clipboard

dependsOn dependency chains slow down kustomizations considerably

Open wvh opened this issue 2 months ago • 2 comments

Describe the bug

For a Flux setup that involves a lot of chained Kustomizations using dependsOn, for example:

bootstrap --> controllers --> databases --> post-config --> app1 / app2 / app3

The first synchronisation is necessarily slow because Flux has to create the cluster resources from scratch. However, further simple changes to business apps trigger all Kustomizations and have to wait for the health checks to pass.

This isn't strictly a bug, but it would be nice to have a kind of dependency where a Kustomization – like bootstrap – is assumed healthy if it has passed at least once, so a small change to a Kustomization at the end of the chain would not have to wait a potentially long time to be applied because Flux is crawling through the whole dependency chain.

The logic here is that most "infrastructure" Kustomizations rarely change, while "app" Kustomizations change very often.

It is not obvious to me if Flux can be tricked into doing that currently without removing the dependsOn logic entirely.

Steps to reproduce

  1. Create a logically ordered chain of Kustomizations that depend on each other
  2. Make regular changes to a Kustomization at the end of that depedency chain
  3. Consistently wait a long time for small changes to be applied

Expected behavior

I guess Flux works as described, though often Kustomizations run without direct changes, and Flux should be faster and either notice that by itself or take a hint from a hypothetical field that implies the nature of the dependency.

Screenshots and recordings

No response

OS / Distro

N/A

Flux version

v2.2.3

Flux check

N/A

Git provider

No response

Container Registry provider

No response

Additional context

No response

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

wvh avatar Apr 22 '24 15:04 wvh

See here an example on how you can speed up the dependencies check with --requeue-dependency https://fluxcd.io/flux/installation/configuration/vertical-scaling/#increase-the-number-of-workers-and-limits

stefanprodan avatar Apr 22 '24 16:04 stefanprodan

Besides setting a shorter value for --requeue-dependency the key to making changes progress rapidly is to reduce interdependency and add receivers. Those are both complicated issues and you can't always remove a dependency link from the chain, but requeue-dependency has a default setting of 30s, you can instantly make this issue much better by setting it to a lower value in the range of 2-5s. You can see a chain of dependencies like yours, with length 4, reduce the wait from over 2 minutes to a best-case of less than 30s.

Teasing apart your config sources is the next key to further improve the best-case scenario.

If an update to "app1" comes from a change within the bootstrap repo, then it will necessarily trigger all the Flux Kustomization dependencies in the line that all have the same bootstrap repo source, because every Kustomization downstream of a Source that changed needs to perform a dry-run and check for changes. But if each app is automated with (say) an independent OCI Repository, the parent Kustomizations need not be triggered first for any of the Kustomizations that derived from the repo upstream, or any upstream config repo. You can use semver wildcards to push out the latest version of an OCI-hosted app without a change in the Flux configuration upstream of it in the bootstrap repo to achieve this.

Then you can start seeing your app deploy progress ASAP with no requeues, in as little as 2-5s of being pushed. This fast timing will only be possible with the aid of a separate source for the apps' config data + addition of Receiver webhooks.

The key (unstated afaict) issue with your configuration that causes the problem you're encountering is that you are apparently using a single source for everything. If the configuration is in a monorepo, you can still potentially overcome this issue by adding independent OCIRepository flux push artifact workflows for each independent config source that stand in for the GitRepository, so changes in different parts of the repo don't cross-talk like this.

A change in the app1/ or post-config should trigger only the right-hand-side of the monolith, not the whole bootstrap controllers databases parent etc. - so you can probably split this repo into at least "apps" and "infra" to help address the performance issue here without necessarily "going full microservices."

kingdonb avatar Apr 23 '24 11:04 kingdonb