dashboard icon indicating copy to clipboard operation
dashboard copied to clipboard

[BUG][UI] Cant save cluster with edit as yaml, corrupts encryption.yaml (worse in 2.7.0)

Open BobVanB opened this issue 3 years ago • 3 comments

Rancher Server Setup

  • Rancher version: 2.7.0
  • Installation option Docker:
  • Proxy/Cert Details: -

Information about the Cluster

  • Kubernetes version: Does not matter
  • Cluster Type (Local/Downstream): RKE

User Information

  • What is the role of the user logged in? Does not matter

Describe the bug This bug is still present: https://github.com/rancher/rancher/issues/36197

To Reproduce

  1. docker run -d --rm -p 443:443 --privileged --name rancher "rancher/rancher:v2.7.0"
  2. open https://localhost

These steps are all in the ui:

  1. bootstrap the cluster as told in the ui
  2. generate a random password
  3. create a new cluster with RKE
  4. use a random name for the cluster
  5. merge the encryption config from https://rancher.com/docs/rke/latest/en/config-options/secrets-encryption/
  6. selecteer all rollen
  7. click op done
  8. edit the cluster as yaml
  9. click on save
  10. edit the cluster as yaml

Result

The config at step 10:

resources:
  - providers:
      - aescbc:
          keys:
            - name: k-fw5hn
              secret: RTczRjFDODMwQzAyMDVBREU4NDJBMUZFNDhCNzM5N0I=
        aesgcm: {}
        identity: {}
        kms: {}
        secretbox: {}
      - aescbc: {}
        aesgcm: {}
        identity: {}
        kms: {}
        secretbox: {}

The config at step 13:

resources:
  - providers:
      - aescbc:
          keys:
            - name: k-fw5hn
              secret: RTczRjFDODMwQzAyMDVBREU4NDJBMUZFNDhCNzM5N0I=
        aesgcm:
          keys: null
        identity: {}
        kms:
          endpoint: ''
          name: ''
          timeout: {}
        secretbox:
          keys: null
      - aescbc:
          keys: null
        aesgcm:
          keys: null
        identity: {}
        kms:
          endpoint: ''
          name: ''
          timeout: {}
        secretbox:
          keys: null

Expected Result

The configuration should be the same as on stap 7 and thus the configuration on this page: https://rancher.com/docs/rke/latest/en/config-options/secrets-encryption/

Screenshots

No Screenshots.

Additional context

It worse than version 2.6 and lower, but ever still present. https://github.com/rancher/rancher/issues/36197

BobVanB avatar Jan 19 '23 19:01 BobVanB

Running the above commands with stable rancher, at the time of writing version v2.7.6. After the steps above you can try multiple cluster configurations, with different results. The cluster created is a RKE1 custom cluster.

This version of rancher will do a couple of steps to ensure that the encryption key is in place. These are the steps when there is a custom_config in the cluster yaml.

  1. The custom config is extracted and written to disk.
  2. The custom config is also written to a secret in kubernetes.
  3. The custom config is remove from the cluster yaml.
  4. Rancher should restart the kube-apiserver, to load the new encryption.yaml
  5. Rancher should start a new backup to ensure minimum loss of etcd. (worse case)

Current observation of applying the secret config is (keys are not real):

  1. Configurations used by the example: secrets-encryption
      secrets_encryption_config:
        custom_config:
          apiVersion: apiserver.config.k8s.io/v1
          kind: EncryptionConfiguration
          resources:
          - resources:
            - secrets
            providers:
            - aescbc:
                keys:
                - name: key1
                  secret: SRKtKHDdXerjDtDi112w8nTmQ/Gx9rc6Cgm36gakVgM=
            - identity: {}
    
    Result with view cluster as yaml
      kubeApi:
        secretsEncryptionConfig:
          customConfig:
            apiVersion: apiserver.config.k8s.io/v1
            kind: EncryptionConfiguration
            resources: null
    

  1. apiVersion and kind removed, still encryption disabled
      secrets_encryption_config:
        custom_config:
          resources:
          - resources:
            - secrets
            providers:
            - aescbc:
                keys:
                - name: key1
                  secret: SRKtKHDdXerjDtDi112w8nTmQ/Gx9rc6Cgm36gakVgM=
            - identity: {}
    
    Result with view cluster as yaml
        secretsEncryptionConfig:
          customConfig:
            resources: null
    

  1. With enabled: true Does not matter if its the yaml from the first or the second test.
      secrets_encryption_config:
        enabled: true
        custom_config:
          resources:
          - resources:
            - secrets
            providers:
            - aescbc:
                keys:
                - name: key1
                  secret: SRKtKHDdXerjDtDi112w8nTmQ/Gx9rc6Cgm36gakVgM=
            - identity: {}
    
    Results: a. This is the same result as the second test. b. Rancher will get in a never ending loop of applying the secret encryption. c. The cluster will never be finished updating. Thus making other changes to the cluster not possible.

Workaround:

  1. Edit the cluster as yaml.
  2. Remove the empty entry and save the cluster.
      secrets_encryption_config:
        enabled: true
    

Other issues worth mentioning:

  • The documentations examples are not working.
    • The indentation of the example encryption_config
    • camelcasing. Providers vs resources, pick one? More keys will have the same issues: aescbc vs AESCBC, Keys vs keys, etc.
    • The examples yaml at custom-at-rest-data-encryption-configuration looks both different.
  • When there is an error in the cluster yaml and you hit save, the cluster config is still applied. This will result in the loaded yaml in the ui as invalid and you have to do edit cluster as yaml again.
  • The above results is from view cluster as yaml, when you edit the cluster as yaml 'null' will be a empty dictionary'
        secretsEncryptionConfig:
          customConfig:
            resources: null
    
      secrets_encryption_config:
        custom_config: {}
        enabled: false
    
  • The secret encryption is written to disk, but the kube-apiserver is not restarted. When upgrading the cluster with new keys, (rollover), there is a chance that kube-apiserver is not loaded with the latest keys. This will break the clusters.
  • Save cluster without changes will not restart the kube-apiserver.
  • The snapshot will contain the encryption key and etcd backup. When obtaining the a backup, this will be very useful for decrypting the etcd. (even if it isn't your etcd backup)
  • Encrypting the secrets again during key rollover with rke is very slow. Doing the same directly on the etcd database is very fast This is the result for a small cluster:
    # RKE1
    7:58:28 am [INFO ] [rewrite-secrets] 50 secrets rewritten
    8:04:46 am [INFO ] [rewrite-secrets] Operation completed, 1878 secrets rewritten
    # 378 secondes
    
    # Controle Plane
    $ time docker exec -it kubelet bash -c "kubectl --kubeconfig=/etc/kubernetes/ssl/kubecfg-kube-controller-manager.yaml get secrets --all-namespaces -o json | kubectl --kubeconfig=/etc/kubernetes/ssl/kubecfg-kube-controller-manager.yaml replace -f -"
    real    0m21.865s
    user    0m0.100s
    sys     0m0.138s
    

BobVanB avatar Sep 25 '23 09:09 BobVanB

~Possibly related to #11020~

Issue pre-dates that bug

gaktive avatar May 29 '24 13:05 gaktive

Bumping to 2.10.0, as not a regression.

Should consider backport

nwmac avatar Jul 02 '24 09:07 nwmac

RKE1 will be end of life shortly, so closing as won't fix

nwmac avatar May 31 '25 21:05 nwmac