rancher icon indicating copy to clipboard operation
rancher copied to clipboard

[BUG] Digital Ocean Managed Kubernetes Upgrade Problem

Open alberk8 opened this issue 2 years ago • 7 comments
trafficstars

Rancher Server Setup

  • Rancher version: V2.7.6

  • Installation option (Docker install/Helm Chart): Docker Install

  • Proxy/Cert Details:

Information about the Cluster

  • Kubernetes version: 1.25.14
  • Cluster Type (Local/Downstream): Imported

User Information

  • What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom)
    • If custom, define the set of permissions:

Describe the bug

I am getting error when trying to upgrade cluster to v 1.26.7

To Reproduce

Result

Expected Result

The upgrade should be possible as the Kubernetes Version 1.26.7 is supported by Rancher V 2.7.6

Screenshots

image

Additional context

alberk8 avatar Nov 24 '23 09:11 alberk8

Hi,

I get the response from Digital Ocean. May I ask, if this would cause any ill effect on Rancher after the update?

$ kubectl get validatingwebhookconfigurations rancher.cattle.io -o yaml | grep failurePolicy:
  failurePolicy: Ignore
  failurePolicy: Ignore
  failurePolicy: Fail
  failurePolicy: Fail
  failurePolicy: Fail
  failurePolicy: Fail
  failurePolicy: Ignore

$ kubectl get validatingwebhookconfigurations rancher.cattle.io -o yaml | grep timeoutSeconds:
  timeoutSeconds: 10
  timeoutSeconds: 10
  timeoutSeconds: 10
  timeoutSeconds: 10
  timeoutSeconds: 10
  timeoutSeconds: 10
  timeoutSeconds: 10

Recommeded Change Can you ensure that your webhooks for your cluster have a failure Policy set to 'Ignore'? You can use the below command to list the validatingwebhookconfigurations.

kubectl get validatingwebhookconfigurations

Then edit the failurePolicy to 'Ignore' using: kubectl edit validatingwebhookconfigurations rancher.cattle.io

alberk8 avatar Nov 28 '23 02:11 alberk8

@alberk8 These checks are needed to ensure Rancher/Rancher security, so we can not change the failure policy to ignore. You can do this for your personal use as a workaround. But you will leave your Rancher instance vulnerable to known CVEs while the Rancher-Webhook Pod is not running.

We will need to come up with steps to ensure a secure upgrade which will require some triage on our end.

KevinJoiner avatar Dec 05 '23 18:12 KevinJoiner

I find that manually editing the failure policy will last for a short while (30 minutes) before it get reverted to default.

alberk8 avatar Dec 06 '23 08:12 alberk8

@alberk8 Correct when the rancher-webhook pod is started again, it will revert the policy to its desired configuration, so it's an acceptable workaround for some users, but we can not officially recommend removing security checks even if it is only for a small window.

KevinJoiner avatar Dec 06 '23 14:12 KevinJoiner

Issue I am also running into this issue with DigitalOcean updating my kubernetes cluster to v1.28 image Currently the only thing I can do is set the failurePolicy to Ignore for the 30 min and run the upgrade manually instead of waiting for DO to do it.

Expected Result The upgrade should be possible as the Kubernetes Version 1.28.11 is supported by Rancher v2.8.5

Or is there another solution to this problem?

jankenr avatar Aug 01 '24 13:08 jankenr

@alberk8 How did you solve it?

jankenr avatar Aug 01 '24 13:08 jankenr

@alberk8 How did you solve it?

This is what I did.

1) Open the Digital Ocean Kubernetes Update Page (this is to get ready) because the Rancher will revert the setting pretty quick

2) Check the Kubernetes Configuration

$ kubectl get validatingwebhookconfigurations rancher.cattle.io -o yaml | grep failurePolicy:
  failurePolicy: Ignore
  failurePolicy: Ignore
  failurePolicy: Fail
  failurePolicy: Fail
  failurePolicy: Fail
  failurePolicy: Fail
  failurePolicy: Ignore

3 ) Edit the Kubernetes Config

kubectl edit validatingwebhookconfigurations rancher.cattle.io

# Change these 'failurePolicy: Fail' to 'failurePolicy: Ignore' 
# failurePolicy: Ignore allows the cluster to communicate while the webhook is unresponsive. 

4) On the DO Kubernetes Update, Click on the Run Again

5) If no blocking error, Just Click On Update. You might have to do steps again if your Kubernetes version is behind.

alberk8 avatar Aug 02 '24 02:08 alberk8