talos
talos copied to clipboard
docs: KubePrism is actually not enabled by default on existing clusters
Bug Report
Despite the (now misleading) release notes and docs for 1.6 - the KubePrism is not enabled by default - it is only enabled by default for new clusters, i.e., the ones with new config from CLI. The original PR https://github.com/siderolabs/talos/pull/7788 also incorrectly dropped the relevant docs showing how to enable the KubePrism as they are still needed for existing clusters.
I suggest release notes and docs should be updated to reflect this actual meaning of this change.
You can use 1.5 docs to see how to enable KubePrism, but yes release notes need to be corrected. Talos never enables any features on upgrade.
Hmm, but the docs for 1.6+ should not drop the instructions for configuring KubePrism as they are still very valid...
I've just setup a new cluster with installer version '1.6.7' and talosctl version 'v1.4.8'. I was not allowed to add machine.features.kubePrism using 'talosctl patch'. Instead I had to manually edit the config or patch the node directly. Without this there was no kubePrism.
Please use same version of talosctl
with Talos clusters. Out of sync versions might work wrong way.
Yeah that sounds right. The only installation instructions I could find at the time were pipe to bash
reading through that script (as I should have done before executing it) shows that the talosctl binary is actually neatly released with everything else. So my contribution here was user error and thus off topic for this issue.
Maybe I'm missing something, but I have a cluster that started on 1.4 (forget patch), but has been upgraded through 1.5.{3,4,5}, 1.6.7, and now 1.7.5. I hadn't had a chance to use kubeprism yet, but in looking at some cluster configuration maintenance came across changing cillium to use localhost:7445 for the API server. However, when trying this the components are not able to make a connection. I can use localhost:6443 on the control-plane nodes, but that isn't kubeprism and is the actual api-server pod.
I'm unable to set the kubeprism feature on the machine config and cycling in new machines doesn't result in them having kubeprism active. For example, one machine which had gone through these upgrades still had the stable ifname disabled since that was a post 1.4 change. Reseting this machine and applying config results in the ifname change taking effect, but kubeprism is not active (at least nothing listening on 7445). The machine config setting kubeprism is rejected I expect because it should be defaulting now to on. Kubeprism however is not on and there is no way to enable it on upgraded clusters.
Talos Linux never enables new features automatically on upgrade if these are configurable. So upgrades are safe in the sense that it's less surprising.
When upgrading, you can look through the release documentation to figure out how/when to enable new features (if you'd like to enable it): e.g. KubePrism.
When the docs say that it's enabled by default for new clusters, it means that machine configuration generated for Talos version >= X now enables this feature explicitly in the machine configuration.
Thank @smira for the response. The point of my post was not to say that it should have been automatically enabled. I was under the impression, both from my own experience trying to enable it post upgrade and a previous post in this issue, that it could not be enabled. Since it doesn't show up by default in the machine configuration and I went to the linked PR that removed the enablement documentation, I was using the wrong configuration field. The PR has the example as kubeprism
for the feature when it is actually kubePrism
. The published docs for 1.5 has the right value.
The lack of the current value in machine config seems like a reproduciblity miss. If one was to try and recover a cluster starting from an etcd snapshot and the machine configs, would that not result in kubeprism being enabled then since the machine configs are lacking the now non default value?
The lack of the current value in machine config seems like a reproduciblity miss. If one was to try and recover a cluster starting from an etcd snapshot and the machine configs, would that not result in kubeprism being enabled then since the machine configs are lacking the now non default value?
The machine configs default values don't change over time, so there's no reproducibility problem.
If the documentation is wrong, please send a PR.