k8s-gitops icon indicating copy to clipboard operation
k8s-gitops copied to clipboard

cert-manager crds race condition

Open ChipWolf opened this issue 3 years ago • 5 comments

There's a race condition installing CRDs for cert-manager when setting this cluster up from scratch.

The helm release has installCRDs set to true, but there's also a Kustomization with the CRDs which Flux tends to install before the chart. The chart refuses to install b/c there's existing CRDs which don't belong to Helm.

ChipWolf avatar Mar 01 '21 23:03 ChipWolf

Thanks @ChipWolf. This is a problem, and I'm not sure the best approach for fixing it:

  • Helm will not upgrade changes to CRDs beyond whatever version they were upon initial installation
  • Therefore, it stands to reason that CRDs should be installed and upgraded outside of helm

The race condition you describe suggests that setting the helm-to-install-CRDs flag to false should resolve the issue. There is some risk in doing this (as I learned painfully with the rook chart) in that flipping the flag from true to false will cause helm to remove the CRDs as part of the change action, despite the fact that they are being installed separately.

I need to noddle on the best way to handle this. In this particular situation it won't be too disruptive if this happens with cert-manager (or if I simply completely blow-away the cert-manager HelmRelease and let flux reconcile a fresh installation), because it should just request a new wildcard cert.

billimek avatar Mar 03 '21 13:03 billimek

You could add a strategic merge patch to the CRD's Kustomization to add the labels Helm expects to see; that'd keep Helm happy & would allow the CRDs to be upgraded dynamically.

It sounds awfully sketchy, but 🤷‍♂️

ChipWolf avatar Mar 04 '21 17:03 ChipWolf

As things stand right now in this repo's configuration with flux2, even installing the cert-manager CRDs out-of-band from the helm chart causes issues and failures within flux to reconcile. The reason for this is that flux appears to be trying to apply objects leveraging those CRDs before the CRDs are applied, and is failing fast instead of becoming eventually consistent.

Based on a discussion in discord, it looks like there are two paths:

  1. Disable flux validation to allow the objects to eventually become consistent and work. I don't like this idea because there are a lot of benefits from the validations beyond this specific case.
  2. Do some kustomization voodoo magic to 'force' the CRDs (or chart if that's the entity applying the CRDs) to get applied before the consuming objects (e.g. ClusterIssuers or Certificates). Like you are hinting at, @ChipWolf . his will require some more research on my part. I've been trying to avoid restructuring this repo to fix the flux opiniation model, but may reconsider this if it makes implementing the ordering solution 'easier'.

billimek avatar Mar 04 '21 18:03 billimek

With cert-manager, I ended up going the kustomization route to resolving the CRD issues, in the end this allowed me to also break up my flux repo into production and staging

echel0n avatar Mar 04 '21 21:03 echel0n

Related to #2870 ?

onedr0p avatar Apr 30 '23 20:04 onedr0p