website
website copied to clipboard
Document how to install cert-manager using gitops and known issues with particular gitops implementations
We could add some documentation briefly explaining how to install cert-manager using gitops systems like Flux or Anthos.
There are some known issues around the installation of CRDs and the subsequent injection of webhook caBundles into the CRDs where.
For example this conversations in Slack (https://kubernetes.slack.com/archives/C4NV3DWUC/p1599044537041800) where it is said that Flux does not work well with cert-manager upgrades. Perhaps because as it upgrades CRDs it clobbers the injected caBundles and then attempts to read the state of the existing certificates before the caBundles have been re-injected and gets stuck because its calls to the K8S API server require a call to the (now unreachable) conversion webhook.
It could very well be that I have misunderstood the problem description, and I certainly don't know very much about Flux
But it would be good to try it, document any problems that do exist.
/cc @munnerz @meyskens @jfrancisco0
See also:
- https://github.com/jetstack/cert-manager/issues/2197
My specific issue with cert-manager + Flux is as follows:
- Deploy cert-manager via Flux:
- Create HelmRelease
- Create ClusterIssuers
- Add ingress annotations
- Something happens that breaks the webhook service, or it's broken by a new version. This was common for me in these situations:
- Webhook isn't reachable (fixed by https://github.com/jetstack/cert-manager/pull/3113)
- Helm release for some reason gets in a state different from "Deployed" making it unable to upgrade normally, so I delete the HelmRelease for it to be redeployed by Flux (deleting the webhook service).
- Since Flux is watching some cert-manager custom resources (ClusterIssuers) it will start throwing errors because it can't reach the webhook (since it's not deployed), and fails to sync with git.
So, in the end, it won't deploy my updated cert-manager release (or anything else) and I need to either install cert-manager from outside Flux, or delete the CRDs (deleting the CRDs just hangs and doesn't complete successfully, I need to use this workaround https://github.com/kubernetes/kubernetes/issues/60538#issuecomment-369099998). Also note that, in this state, I can't delete the ClusterIssuers (or any other cert-manager resources) with kubectl either.
This is maybe more of an issue with Flux and K8s webhooks than with cert-manager, but perhaps something in the process can be improved. Hope my comment isn't too confusing!
I've also asked a question over in #sig-api-machinery on Slack about this, as I believe it's ultimately caused by the need for the webhook component to be upgraded before the CRDs (and then the controllers that consume those CRDs after that): https://kubernetes.slack.com/archives/C0EG7JC6T/p1599220160144500
That said, with the cainjector issue (https://github.com/jetstack/cert-manager/issues/3251) this will be made more problematic again, and even once that issue is solved, implies that the cainjector would need to be deployed alongside the webhook in 'phase 1' of installation.
I also think this issue affects more than just GitOps users - technically, whilst using kubectl replace
or similar would work fine eventually (unlike in this GitOps case), it also leads to brief periods of outages for the cert-manager API whilst the upgrade is taking place (due to all the issues described here).
The tutorial A Complete Step by Step Guide to Implementing a GitOps Workflow with Flux includes an example of deploying cert-manager using Flux.
It would be good to get this done as the number of user issues/questions related to this seem to be increasing.
Related issue cert-manager#3291
/priority important-soon
We are currently short of bandwidth. I think this would still be a valuable addition to the documentation if someone is willing to contribute this.
hey can I work on this ?