rancher
rancher copied to clipboard
[BUG] Digital Ocean Managed Kubernetes Upgrade Problem
Rancher Server Setup
-
Rancher version: V2.7.6
-
Installation option (Docker install/Helm Chart): Docker Install
-
Proxy/Cert Details:
Information about the Cluster
- Kubernetes version: 1.25.14
- Cluster Type (Local/Downstream): Imported
User Information
- What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom)
- If custom, define the set of permissions:
Describe the bug
I am getting error when trying to upgrade cluster to v 1.26.7
To Reproduce
Result
Expected Result
The upgrade should be possible as the Kubernetes Version 1.26.7 is supported by Rancher V 2.7.6
Screenshots
Additional context
Hi,
I get the response from Digital Ocean. May I ask, if this would cause any ill effect on Rancher after the update?
$ kubectl get validatingwebhookconfigurations rancher.cattle.io -o yaml | grep failurePolicy:
failurePolicy: Ignore
failurePolicy: Ignore
failurePolicy: Fail
failurePolicy: Fail
failurePolicy: Fail
failurePolicy: Fail
failurePolicy: Ignore
$ kubectl get validatingwebhookconfigurations rancher.cattle.io -o yaml | grep timeoutSeconds:
timeoutSeconds: 10
timeoutSeconds: 10
timeoutSeconds: 10
timeoutSeconds: 10
timeoutSeconds: 10
timeoutSeconds: 10
timeoutSeconds: 10
Recommeded Change Can you ensure that your webhooks for your cluster have a failure Policy set to 'Ignore'? You can use the below command to list the validatingwebhookconfigurations.
kubectl get validatingwebhookconfigurations
Then edit the failurePolicy to 'Ignore' using:
kubectl edit validatingwebhookconfigurations rancher.cattle.io
@alberk8 These checks are needed to ensure Rancher/Rancher security, so we can not change the failure policy to ignore. You can do this for your personal use as a workaround. But you will leave your Rancher instance vulnerable to known CVEs while the Rancher-Webhook Pod is not running.
We will need to come up with steps to ensure a secure upgrade which will require some triage on our end.
I find that manually editing the failure policy will last for a short while (30 minutes) before it get reverted to default.
@alberk8 Correct when the rancher-webhook pod is started again, it will revert the policy to its desired configuration, so it's an acceptable workaround for some users, but we can not officially recommend removing security checks even if it is only for a small window.
Issue
I am also running into this issue with DigitalOcean updating my kubernetes cluster to v1.28
Currently the only thing I can do is set the failurePolicy to Ignore for the 30 min and run the upgrade manually instead of waiting for DO to do it.
Expected Result The upgrade should be possible as the Kubernetes Version 1.28.11 is supported by Rancher v2.8.5
Or is there another solution to this problem?
@alberk8 How did you solve it?
@alberk8 How did you solve it?
This is what I did.
1) Open the Digital Ocean Kubernetes Update Page (this is to get ready) because the Rancher will revert the setting pretty quick
2) Check the Kubernetes Configuration
$ kubectl get validatingwebhookconfigurations rancher.cattle.io -o yaml | grep failurePolicy:
failurePolicy: Ignore
failurePolicy: Ignore
failurePolicy: Fail
failurePolicy: Fail
failurePolicy: Fail
failurePolicy: Fail
failurePolicy: Ignore
3 ) Edit the Kubernetes Config
kubectl edit validatingwebhookconfigurations rancher.cattle.io
# Change these 'failurePolicy: Fail' to 'failurePolicy: Ignore'
# failurePolicy: Ignore allows the cluster to communicate while the webhook is unresponsive.
4) On the DO Kubernetes Update, Click on the Run Again
5) If no blocking error, Just Click On Update. You might have to do steps again if your Kubernetes version is behind.