Feature: Avoid using `latest` templates by default and consider pinning via CRD
Motivation
terraformer template selection currently defaults to the latest git tag. This is not a good behavior, because sometimes new template changes might have a Claudie upgrade as a dependency and the switch to a new template needs to happen in a controlled manner.
Moreover, should the user override templates for several providers, this issue proposes a new CRD object, that will avoid copy/pasting code over the InputManifest.spec.providers blocks and will also allow improving UX for migrating all the cluster manifests to a new version; in such a case only one modification is required - in the Custom Resource.
Description
By default, this issue proposes that Claudie always takes templates from git tag identical with the Claudie release (e.g. v0.9.10). The repo/tag selection could be overridden via the following API: https://docs.claudie.io/latest/input-manifest/external-templates/; or alternatively via the below CRD, should the below proposal be accepted.
Now, as a part of this topic, we propose moving the template reference to a custom CRD:
apiVersion: claudie.io/v1beta1
kind: TemplateReference
metadata:
name: private-templates
labels:
app.kubernetes.io/part-of: claudie
spec:
repository: "https://git.somehost.io/claudie-templates"
tag: "v0.9.10"
path: "production"
The path further expects production/terraformer/<provider (e.g. genesiscloud)>
On top of this, we suggest to always deploy the following custom resource within the install guide:
apiVersion: claudie.io/v1beta1
kind: TemplateReference
metadata:
name: upstream-templates
labels:
app.kubernetes.io/part-of: claudie
spec:
repository: "https://github.com/berops/claudie-config"
tag: "v0.9.10"
path: "templates"
Such a TemplateReference will be referenced in the InputManifest the following way:
spec:
providers:
- name: genesiscloud
providerType: genesiscloud
templateRef: private-templates # optional, default to `upstream-templates`
secretRef:
name: genesiscloud-secret
namespace: secrets
The CRD, should it exist, will take a precedence over the template: stanza.
Exit criteria
- [ ] Optionally, migrate the template version pinning API to a CRD
- [ ] By default, always consume the templates from the tag, that matches Claudie release tag
The TemplateRepository CRD could be useful if you would have many InputManifests referencing the same templates, which would skip some boilerplate and possible typos copy/paste errors. Thats the only pro I can think of compared to the current solution
Additionally, for future use cases where anyone could define their own version of say ansible-playbooks/manifests to be deployed we would also need to implement another CRD that would be referenced per-cluster, for this I would like to use the Settings CRD introduced in the Envoy PR.
It allows you to reference a Setting custom resource at Role level of the loadbalancer that overwites the default configs that claudie ships with custom ones.
...
loadBalancers:
roles:
- name: apiserver
protocol: tcp
port: 6443
targetPort: 6443
targetPools:
- control-pools
settings:
proxyProtocol: true
stickySessions: false
+ settingsRef:
+ name: custom-envoy
+ namespace: claudie
This could be then further extented to be also used at the kuberentes cluster level for custom ansible-playbooks/manifests
kubernetes:
clusters:
- name: dev-cluster
version: 1.27.0
network: 192.168.2.0/24
+ settingsRef:
+ name: custom-manifests
+ namespace: claudie
pools:
control:
- control-htz
- control-gcp
compute:
- compute-htz
- compute-gcp
- compute-azure
- htz-autoscaled
and the Settings CRD would look like
apiVersion: claudie.io/v1beta1
kind: Setting
metadata:
name: custom-envoy
namespace: claudie
labels:
app.kubernetes.io/part-of: claudie
spec:
envoy:
lds: |
...
cds: |
...
apiVersion: claudie.io/v1beta1
kind: Setting
metadata:
name: custom-manifests
namespace: claudie
labels:
app.kubernetes.io/part-of: claudie
spec:
ansible:
playbooks: ["list of items to download and apply"]
There could be multiple Settings created or simply a single giant one that would be referenced at different levels in the InputManifest and always the context relavant to where it was referenced would be extracted.
So if you would reference the custom-envoy at the kubernetes cluster level it would do nothing as it would extract spec.ansible or spec.manifests overwrites which would be empty.
Note that having a pre-defined version of templates for each release would result in rolling-update of the infrastructure whenever claudie itself is updated.
@Despire
The
TemplateRepositoryCRD could be useful if you would have many InputManifests referencing the same templates, which would skip some boilerplate and possible typos copy/paste errors. Thats the only pro I can think of compared to the current solutionAdditionally, for future use cases where anyone could define their own version of say ansible-playbooks/manifests to be deployed we would also need to implement another CRD that would be referenced per-cluster, for this I would like to use the
SettingsCRD introduced in the Envoy PR.It allows you to reference a
Settingcustom resource at Role level of the loadbalancer that overwites the default configs that claudie ships with custom ones.... loadBalancers: roles: - name: apiserver protocol: tcp port: 6443 targetPort: 6443 targetPools: - control-pools settings: proxyProtocol: true stickySessions: false + settingsRef: + name: custom-envoy + namespace: claudieThis could be then further extented to be also used at the kuberentes cluster level for custom ansible-playbooks/manifests
kubernetes: clusters: - name: dev-cluster version: 1.27.0 network: 192.168.2.0/24 + settingsRef: + name: custom-manifests + namespace: claudie pools: control: - control-htz - control-gcp compute: - compute-htz - compute-gcp - compute-azure - htz-autoscaledand the
SettingsCRD would look likeapiVersion: claudie.io/v1beta1 kind: Setting metadata: name: custom-envoy namespace: claudie labels: app.kubernetes.io/part-of: claudie spec: envoy: lds: | ... cds: | ...apiVersion: claudie.io/v1beta1 kind: Setting metadata: name: custom-manifests namespace: claudie labels: app.kubernetes.io/part-of: claudie spec: ansible: playbooks: ["list of items to download and apply"]There could be multiple
Settingscreated or simply a single giant one that would be referenced at different levels in theInputManifestand always the context relavant to where it was referenced would be extracted.So if you would reference the
custom-envoyat the kubernetes cluster level it would do nothing as it would extractspec.ansibleorspec.manifestsoverwrites which would be empty.
For clarification, would these Settings object would be included to the data passed to the terraform templates? If so and if this Settings CRD can contain any type structure users want, this would complement perfectly #1756 as users would be now able to create any logic they want for provider-specific and/or environment-specific behavior/constraints within templates with the custom data they need.
As of right now you would have to use an annotation in the cluster/node pool definition with a json string to then unparse in template (which isn't possible at the moment and why I created #1756) to be able to pass arbitrary data to templates, which is really not something I'd like to actually deploy, both because annotations are absolutely not meant for this, and because of the security implications of having (possibly critical) configuration data be stored in an annotation as plaintext.
When will this set of FR be reviewed, and if aproved, PR open to be able to start working on these?
Now that #1756 (and by extension #1768) are merged, all kinds of computation can now be done in templates, but the issue remain that users can't pass arbitrary data to templates in other ways than annotations/labels on Cluster/NodePool CRDs. Should this FR be worked on next? @Despire @bernardhalas
We are still discussing as to how exactly we will implement this. As for if this FR will be worked on next, I would say so, once we agree on an implementation, until then this will be on hold.
but the issue remain that users can't pass arbitrary data to templates
We'll also try to consider passing user-defined data to the templates, if the scope of this FR will not require to much changes, if yes we might handle it as another FR after this one.
This issue is a tip of the iceberg of larger set of issues, all linked together.
- How to template manifests/playbooks/configs not just for
terraformer, but also foransibler(Ansible plays), forkubeEleven(KubeOne), forkuber(K8s manifests). The concept is going to be designed as a part of this issue. The implementation forterraformerwill also be a goal of this issue. - We will design how to pass a custom dataset to the templating engine, so that user has a full transparency over which variables and variable keys can be referenced in the user templates.
- We will design an interface for custom cloud provider creation, by passing the full
providerblock from theInputManifest, including referenced secrets.
Further down, this issue will deal just with item 1. from this list; and just for the terraformer service. The implementation for other services (ansibler (vm config management in ansible), kuber (claudie default k8s manifests), kubeEleven (k8s provisioning), LB config override) will be treated via individual issues.
There's a need to template several sets of manifets, for different purposes, of different formats, languages and sizes. The general template interpolation syntax is Go Templates, as Go has been chosen as the language Claudie has been written in.
Now, there are the following options how to organize the templates.
1. New CRD type that would contain full template (e.g. CRD per category) or a CRD with several templates, differentiated by keys; e.g.
specs:
playbooks: |
...
k8s-manifests: |
...
This CR would be referenced in the InputManifest like:
specs:
kubernetes:
clusters:
- name: dev-cluster
+ settingsRef: regionalNfsServers
The downside is the requirement for a full unwrap of the set of manifests that a user would like to override (e.g. for kuber service). There's a little to no difference if the we separate the template categories by keys (e.g. playbooks, k8s-manifests, ...) or we create different CRD for each (PlaybookSettings, K8sManifestSettings,...).
2. Store templates in a separate Git repo
Define TemplateReference CRD, that would be a pointer to a git repo with templates.
kind: TemplateGitReference
metadata:
name: raspberry-optimized-template
specs:
hostname: git.mycompany.com
commitRef: feature/privateEdgeRegistry
secretRef: gitToken
method: https
path: 'rpi-gen5/'
This template would then be referenced at various places in the InputManifest like:
specs:
kubernetes:
clusters:
- name: development
+ templateRef: raspberry-optimized-template
...
loadbalancers:
roles:
- name: denver-03-hospital
+ templateRef: raspberry-optimized-template
The downside is the additional overhead of the TemplateGitReference CRD that would point to the git repo with the templates. The advantage is that if the same template reference is used multiple times, upgrading to a new version needs to be done in just one place.
Currently, we're in favor of option no. 2.
Next, the various reference points for the templates; preferred option listed first:
- terraformer OpenTofu/Terraform templates:
specs.nodepools.dynamic.[*].terraformRef, alternativelyspecs.providers.[*].terraformRef - LB configs:
specs.loadbalancers.roles.[*].lbconfigsRef, alternativelyspecs.loadbalancers.clusters.[*].lbconfigsRef - ansibler playbooks:
specs.nodepools.[dynamic|static].[*].playbooksRefandspecs.loadbalancers.roles.[*].playbookRef, alternativelyspecs.loadbalancers.clusters.[*].playbooksRef - kuber manifests:
specs.kuberenetes.clusters.[*].addonsRef - kubeEleven manifest:
specs.kubernetes.clusters.[*].provisionRef
It's going to be expected that a single template reference will provider both, templates for kuber and for kubeEleven, because the structure of the git repo with templates will look like:
repo
├── addons
├── k8s
├── lbconfigs
├── playbooks
└── terraform
Next, a special attention will need to be given to the kuber manifests (addons folder), that will contain manifest for deploying and customizing Longhorn, Grafana, Prometheus, and other Claudie addons, but that will come in a different GH issue, which will be referenced here.
Feedback would be greatly welcome here.
My opinion is option 2 would be better here, as it extends on a kind of behavior Claudie is already using with the external git template repo.
As for where the refs should be put in the InputManifest linking to this new CRD, I agree on the first option you mentioned for each. However, requiring those refs in every NodePool (for the ones concerned) could become tedious. To fix this, maybe a Provisioner > Cluster > NodePool default inheritance system could be implemented, whereas Cluster and Provider would also have a field for the Refs in NodePools (terraform, ansible etc..) where each level up would act as default if the field isn't present in the level bellow.
This would mean that when NodePools belonging to a cluster don't have the ref but the cluster does, the NodePools that don't have a ref themselves would inherit the ref from the cluster, same thing one level up with providers.
It would also make so that someone wanting to use a single instance of this new CRD on a single provider in charge of a lot of clusters and pools, these fields would only have to be mentioned once at the provider level and be done with it
While the proposed approach gives most power and control to the users, we feel like it's quite complex from the UX perspective (e.g. a user might be confused where is a particular template sourced from, when it comes to hierarchical overrides).
Let's take it from the completely opposite angle - is there a use-case for having multiple template locations per single InputManifest? That the user would need to refer to multiple sets of Template References in a single manifest and therefore one reference would not suffice?