terraform-provider-kops
terraform-provider-kops copied to clipboard
[bug][v1.23]: cluster_ca_cert and cluster_ca_key always trigger cluster updater
Hi again!

I just tried out v1.23, and spotted this one triggering the cluster updater. I haven't provided the secrets
block in my cluster config.
Thanks for reporting. Does it happen every time or only when you upgrade from a previous version of the provider ?
I would say every time, after a couple applies like that I just checked this morning and again the same scenario on a terraform apply
: revision = 13 -> 14 on kops_cluster.cluster
Maybe useful info: I have completely destroyed the 1.22 cluster and restarted 1.23 in a different AZ as part of the upgrade, so the chances of 1.22 remnants are small (although I didn't explicitly check whether tfstate was empty/deleted before spinning up 1.23)
Just for helping out narrowing in what might be the problem, I don't see this on my own prod-cluster where I have a docker config defined and not on a fresh test-cluster either where I have no docker config.
@ddelange did you see the same problem with 1.22 or only 1.23 ?
I spent some time trying to reproduce the issue, but I didn't succeed. Can you share your tf config ?
Hmm, interesting! Thanks for checking guys. This was not the case before upgrading to 1.23. The only diff to cluster.tf
apart from the version bump, was adding the containerd.config_override
block. Adding a (basicauth) private docker registry to the cluster was the whole reason for updrading to 1.23: as 1.22 has max containerd
version 1.4, I needed 1.5+ such that we can make use of containerd registry mirror auth functionality.
Here's an excerpt of our cluster.tf
This looks similar to what I tested.
You used the provider v1.23.0-alpha.1
with k8s 1.23 right ?
Correct, 1.23.0-alpha.1
and
variable "kubernetes_version" {
type = string
description = "Kubernetes version to use for the cluster. MAJOR.MINOR here should not be newer than the kops provider version in versions.tf ref https://kops.sigs.k8s.io/welcome/releases/"
default = "v1.23.5"
}
I'm now trying to isolate the bug to the config_override
block (applying now without it, seeing if afterwards another apply will trigger the updater again)
I don't think your issue is related to the config override block, but thanks for trying it out.
Looks like unrelated indeed. Behaviour stays the same 🤔
Could be related to permissions in the s3 bucket.
Can you check the files are correctly stored in the bucket (pki/private/kubernetes-ca/keyset.yaml
) ?
The file exists, looks like a valid manifest. Created (r/w perms) by my AWS account yesterday when I created the 1.23 cluster. Same permissions as the rest of the files, e.g. under /instancegroups
Hmmm, I'm dry on ideas :-(
There was an invalid reference to the k/k
package in v1.23.0-alpha.1
but I don't think it would cause such an issue.
I can try to cut v1.23.0-alpha.2
though.
Many thanks for taking the time!
I would just close with that for now and if it disappears over time (or I miraculously find a fix) I will report back here.
Have a nice weekend 💥
hotfix seems to be holding steady :)
lifecycle {
ignore_changes = [
secrets,
]
}
It’s a hack, you shouldn’t need this. If possible I would empty the bucket and try from scratch completely.
I just released v1.23.0-alpha.2
but I doubt it will fix your issue.
Thanks for the ping!
If possible I would empty the bucket and try from scratch completely.
Seems like it did not solve the issue 🤔
Me and @argoyle had this issue as well on some clusters.
Seems to work if we always define secrets
in the config, like:
secrets {
docker_config = "{}"
}
Interesting, I am going to reopen and investigate the issue again.
I was able to somewhat reproduce the issue:
- create a cluster with ca cert/private key
- update the cluster, removing ca cert/private key
- next applies always trigger an update
Would that look like the scenario you are hitting ?
fwiw, I never used the secrets
block
I have a good suspect in mind but no access to an AWS account, making it difficult to track it down.
CAs are stored in pki/private/kubernetes-ca/keyset.yaml
(maybe in pki/private/ca/keyset.yaml
with older kOps versions).
When one doesn't provide a CA cert/key, kOps will create one and I'm not sure if it is stored in the same place (I suppose it is but can't confirm).
From what I understand, it's not possible to remove a CA that is being used so it is probably related.
What could happen:
- You create a cluster without specifying the CA
- kOps generates a CA automatically and stores it
- Finally the CA ends up in the state
- When you run a plan, if you don't specify the
secrets
block, terraform tries to delete the CA - Deleting a CA is not possible (we can eventually rotate it but not delete a CA being used)
- The CA will constantly be added to the state and there will be a permanent diff
Now, if you specify the secrets
block (you can leave it empty) it looks like terraform is not complaining anymore (because CA cert/key are marked as computed).
resource "kops_cluster" "cluster" {
// ...
secrets {}
// ...
}
It would be nice if someone can confirm this.
@peter-svensson @argoyle I suspect you can use an empty secrets
block, no need to use a dummy docker_config
as mentioned in my comment above.
--- a/k8s/kops/cluster.tf
+++ b/k8s/kops/cluster.tf
@@ -217,12 +217,7 @@ EOF
}
}
- lifecycle {
- ignore_changes = [
- secrets,
- ]
- }
+ secrets {}
}
did indeed not trigger the updater!
Seems to do the trick with our setup as well 🎉
we also have a
authorization {
always_allow {}
}
block for the same reason btw :)
Great, thanks for testing it ! I will consider making secrets required and document that it can be left empty.
@ddelange why not RBAC ?
authorization {
rbac {}
}