[BUG][UI] Cant save cluster with edit as yaml, corrupts encryption.yaml (worse in 2.7.0)
Rancher Server Setup
- Rancher version: 2.7.0
- Installation option Docker:
- Proxy/Cert Details: -
Information about the Cluster
- Kubernetes version: Does not matter
- Cluster Type (Local/Downstream): RKE
User Information
- What is the role of the user logged in? Does not matter
Describe the bug This bug is still present: https://github.com/rancher/rancher/issues/36197
To Reproduce
-
docker run -d --rm -p 443:443 --privileged --name rancher "rancher/rancher:v2.7.0" - open https://localhost
These steps are all in the ui:
- bootstrap the cluster as told in the ui
- generate a random password
- create a new cluster with
RKE - use a random name for the cluster
- merge the encryption config from https://rancher.com/docs/rke/latest/en/config-options/secrets-encryption/
- selecteer all rollen
- click op done
- edit the cluster as yaml
- click on save
- edit the cluster as yaml
Result
The config at step 10:
resources:
- providers:
- aescbc:
keys:
- name: k-fw5hn
secret: RTczRjFDODMwQzAyMDVBREU4NDJBMUZFNDhCNzM5N0I=
aesgcm: {}
identity: {}
kms: {}
secretbox: {}
- aescbc: {}
aesgcm: {}
identity: {}
kms: {}
secretbox: {}
The config at step 13:
resources:
- providers:
- aescbc:
keys:
- name: k-fw5hn
secret: RTczRjFDODMwQzAyMDVBREU4NDJBMUZFNDhCNzM5N0I=
aesgcm:
keys: null
identity: {}
kms:
endpoint: ''
name: ''
timeout: {}
secretbox:
keys: null
- aescbc:
keys: null
aesgcm:
keys: null
identity: {}
kms:
endpoint: ''
name: ''
timeout: {}
secretbox:
keys: null
Expected Result
The configuration should be the same as on stap 7 and thus the configuration on this page: https://rancher.com/docs/rke/latest/en/config-options/secrets-encryption/
Screenshots
No Screenshots.
Additional context
It worse than version 2.6 and lower, but ever still present. https://github.com/rancher/rancher/issues/36197
Running the above commands with stable rancher, at the time of writing version v2.7.6. After the steps above you can try multiple cluster configurations, with different results. The cluster created is a RKE1 custom cluster.
This version of rancher will do a couple of steps to ensure that the encryption key is in place. These are the steps when there is a custom_config in the cluster yaml.
- The custom config is extracted and written to disk.
- The custom config is also written to a secret in kubernetes.
- The custom config is remove from the cluster yaml.
- Rancher should restart the kube-apiserver, to load the new encryption.yaml
- Rancher should start a new backup to ensure minimum loss of etcd. (worse case)
Current observation of applying the secret config is (keys are not real):
- Configurations used by the example: secrets-encryption
Result withsecrets_encryption_config: custom_config: apiVersion: apiserver.config.k8s.io/v1 kind: EncryptionConfiguration resources: - resources: - secrets providers: - aescbc: keys: - name: key1 secret: SRKtKHDdXerjDtDi112w8nTmQ/Gx9rc6Cgm36gakVgM= - identity: {}view cluster as yamlkubeApi: secretsEncryptionConfig: customConfig: apiVersion: apiserver.config.k8s.io/v1 kind: EncryptionConfiguration resources: null
- apiVersion and kind removed, still encryption disabled
Result withsecrets_encryption_config: custom_config: resources: - resources: - secrets providers: - aescbc: keys: - name: key1 secret: SRKtKHDdXerjDtDi112w8nTmQ/Gx9rc6Cgm36gakVgM= - identity: {}view cluster as yamlsecretsEncryptionConfig: customConfig: resources: null
- With enabled: true
Does not matter if its the yaml from the first or the second test.
Results: a. This is the same result as the second test. b. Rancher will get in a never ending loop of applying the secret encryption. c. The cluster will never be finished updating. Thus making other changes to the cluster not possible.secrets_encryption_config: enabled: true custom_config: resources: - resources: - secrets providers: - aescbc: keys: - name: key1 secret: SRKtKHDdXerjDtDi112w8nTmQ/Gx9rc6Cgm36gakVgM= - identity: {}
Workaround:
- Edit the cluster as yaml.
- Remove the empty entry and save the cluster.
secrets_encryption_config: enabled: true
Other issues worth mentioning:
- The documentations examples are not working.
- The indentation of the example encryption_config
- camelcasing. Providers vs resources, pick one? More keys will have the same issues: aescbc vs AESCBC, Keys vs keys, etc.
- The examples yaml at custom-at-rest-data-encryption-configuration looks both different.
- When there is an error in the cluster yaml and you hit save, the cluster config is still applied. This will result in the loaded yaml in the ui as invalid and you have to do edit cluster as yaml again.
- The above results is from view cluster as yaml, when you edit the cluster as yaml 'null' will be a empty dictionary'
secretsEncryptionConfig: customConfig: resources: nullsecrets_encryption_config: custom_config: {} enabled: false - The secret encryption is written to disk, but the kube-apiserver is not restarted. When upgrading the cluster with new keys, (rollover), there is a chance that kube-apiserver is not loaded with the latest keys. This will break the clusters.
- Save cluster without changes will not restart the kube-apiserver.
- The snapshot will contain the encryption key and etcd backup. When obtaining the a backup, this will be very useful for decrypting the etcd. (even if it isn't your etcd backup)
- Encrypting the secrets again during key rollover with rke is very slow. Doing the same directly on the etcd database is very fast
This is the result for a small cluster:
# RKE1 7:58:28 am [INFO ] [rewrite-secrets] 50 secrets rewritten 8:04:46 am [INFO ] [rewrite-secrets] Operation completed, 1878 secrets rewritten # 378 secondes # Controle Plane $ time docker exec -it kubelet bash -c "kubectl --kubeconfig=/etc/kubernetes/ssl/kubecfg-kube-controller-manager.yaml get secrets --all-namespaces -o json | kubectl --kubeconfig=/etc/kubernetes/ssl/kubecfg-kube-controller-manager.yaml replace -f -" real 0m21.865s user 0m0.100s sys 0m0.138s
~Possibly related to #11020~
Issue pre-dates that bug
Bumping to 2.10.0, as not a regression.
Should consider backport
RKE1 will be end of life shortly, so closing as won't fix