cluster-api-provider-bringyourownhost icon indicating copy to clipboard operation
cluster-api-provider-bringyourownhost copied to clipboard

Remove host from inventory

Open kerenbenzion opened this issue 3 years ago • 13 comments

I am trying to use byoh provider and I ran into an issue which I cannot delete an host which is not in use. I am trying to run uninstall of the agent on the host side , and to run delete on the management side and I get the following error:

the command I am trying to run :

  • kubectl delete byohosts.infrastructure.cluster.x-k8s.io lab-preprod-k8s-worker-lab-5

the result:

  • Error from server (Forbidden): admission webhook "vbyohost.kb.io" denied the request: byohost.infrastructure.cluster.x-k8s.io "lab-preprod-k8s-worker-lab-5" is forbidden: cannot delete ByoHost when MachineRef is assigned

I have also tried to edit the byoh host object , but I dont manage to commit any changes when I am deleting the machine ref.

Can you let me know what should I do?

  • Kubernetes version: (use kubectl version --short): Client Version: v1.22.4 Server Version: v1.22.4
  • OS: ubuntu 20.4

kerenbenzion avatar Dec 21 '21 08:12 kerenbenzion

Hi @kerenbenzion, There is a delete webhook that restricts deleting a ByoHost CR with a status.MachineRef pointing to a ByoMachine. The Host-Agent should be running and then you can try to scale out MD/KCP.

Note that removing a specific ByoHost/ByoMachine from the workload cluster is not supported as of now.

dharmjit avatar Dec 22 '21 09:12 dharmjit

@dharmjit if I understood correctly, currently we can do only scale up ? We cannot decrease number of hosts if machine ref has been assigned already ? Will it be supported?

kerenbenzion avatar Dec 22 '21 09:12 kerenbenzion

currently we can do only scale up

We can do both scale up/scale down for both KCP/MD as per the CAPI contract. you can use below to update the number of replicas

  • kubectl scale kcp|md
  • kubectl edit kcp|md

We cannot decrease number of hosts.

If I understand your question correctly, you can detach hosts from the cluster with scale-out operation and once you have identified the host which is removed, you can stop the Host-Agent and delete the specific ByoHost CR.

dharmjit avatar Dec 22 '21 09:12 dharmjit

Hi,

I am trying to understand if lets say one of my hosts (worker/controlplane) have issues (physical/os ) and I need some time to fix it Can I remove the byoh host untill the problem is fixed?

kerenbenzion avatar Dec 22 '21 09:12 kerenbenzion

Can I remove the byoh host untill the problem is fixed?

Removing a specific host is not supported as of now.

dharmjit avatar Dec 22 '21 10:12 dharmjit

Do you know/if it will be supported?

kerenbenzion avatar Dec 22 '21 10:12 kerenbenzion

Hey @kerenbenzion

Some context on why you are forbidden to delete a byohost directly. We want to avoid any accidental deletion of byohosts from the cluster which can disrupt your cluster state and workload scheduled on that node.

Just re-iterating what @dharmjit mentioned. For any maintenance activity on the host, you should first do a scale-down operation - as of now we can't choose a particular host while scaling down, so your best option now is to teardown the cluster so that all hosts will be released back to capacity pool or try to scale down (control plane / worker) and hope your host is the one being deprovisioned. Since MachineRef is set on the byohost.Status field, I'm not quite sure if you can edit that - that is why you are not able to commit changes.

When doing a scale-down operation or cluster deprovision, you can check the agent logs and when you see something like MachineRef not set - that is when you are free to issue the kubectl delete byohost command.

Do you know/if it will be supported?

This is a valid use case and thank you for pointing it out. ~~I will open an issue regarding this (most probably after the holidays 😄 )~~

When the team is back we will look at prioritizing this issue and include in one of the releases.

anusha94 avatar Dec 22 '21 10:12 anusha94

Hi,

I am trying to understand if lets say one of my hosts (worker/controlplane) have issues (physical/os ) and I need some time to fix it Can I remove the byoh host untill the problem is fixed?

There is a way can do this, be careful, it is not recommend on your product environment. Delete entire byoh provider resource first, and then clean your cluster resource completely. It should solve your current dilemma.

huchen2021 avatar Jan 17 '22 03:01 huchen2021

Hi @kerenbenzion, We were not aware earlier but CAPI already provides a way to delete a specific Machine. You could perform below steps.

  1. Identify the machine to delete viakubectl get machines
  2. Annotate the machine via kubectl annotate machine machine-name "cluster.x-k8s.io/delete-machine"="yes"
  3. Scale down the KCP/MD via kubectl scale md md-name --replicas=1

Let us know if this works for you.

dharmjit avatar Apr 07 '22 12:04 dharmjit

This might be an absolute hack. But I got around deleting stagnant byoHosts by temporarily removing the validatingwebhookconfiguration for byoHosts

keshavcruise avatar Mar 03 '23 01:03 keshavcruise

I had the similar issue with removing the byohost

k delete byoh byoh-1

Error from server (cannot delete ByoHost when MachineRef is assigned): admission webhook "vbyohost.kb.io" denied the request: cannot delete ByoHost when MachineRef is assigned

Then I tried to remove the machineRef

k edit byoh byoh-1

machineRef:
  apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
  kind: ByoMachine
  name: byoh-cluster-control-plane-jx95s
  namespace: default
  uid: f36c4978-089b-42a3-8d05-a4148f25b5a8

and got this error

error: byohosts.infrastructure.cluster.x-k8s.io "byoh-1" could not be patched: Internal error occurred: failed calling webhook "vbyohost.kb.io": failed to call webhook: Post "https://byoh-webhook-service.byoh-system.svc:443/validate-infrastructure-cluster-x-k8s-io-v1beta1-byohost?timeout=10s": EOF

k -n byoh-system logs -f byoh-controller-manager-6c4555b4dd-szlkp

2023/05/15 21:53:25 http: panic serving 10.42.0.1:48090: runtime error: index out of range [2] with length 2
goroutine 1881 [running]:
net/http.(*conn).serve.func1()
        /usr/local/go/src/net/http/server.go:1825 +0xbf
panic({0x1ae6a20, 0xc000b4a558})
        /usr/local/go/src/runtime/panic.go:844 +0x258
github.com/vmware-tanzu/cluster-api-provider-bringyourownhost/apis/infrastructure/v1beta1.(*ByoHostValidator).handleCreateUpdate(0xc00044fc48, 0xc000c5adc0)
        /workspace/apis/infrastructure/v1beta1/byohost_webhook.go:59 +0x6d5
github.com/vmware-tanzu/cluster-api-provider-bringyourownhost/apis/infrastructure/v1beta1.(*ByoHostValidator).Handle(_, {_, _}, {{{0xc000aff7d0, 0x24}, {{0xc000acfc40, 0x1f}, {0xc0005fe338, 0x7}, {0xc0005fe390, ...}}, ...}})
        /workspace/apis/infrastructure/v1beta1/byohost_webhook.go:35 +0x117
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle(_, {_, _}, {{{0xc000aff7d0, 0x24}, {{0xc000acfc40, 0x1f}, {0xc0005fe338, 0x7}, {0xc0005fe390, ...}}, ...}})
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/webhook/admission/webhook.go:146 +0xa2
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP(0xc00083b180, {0x7fe8940926b8?, 0xc00047c690}, 0xc00066f100)
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/webhook/admission/http.go:99 +0xe90
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerInFlight.func1({0x7fe8940926b8, 0xc00047c690}, 0x1f2ee00?)
        /go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:40 +0xd4
net/http.HandlerFunc.ServeHTTP(0x1f2ee78?, {0x7fe8940926b8?, 0xc00047c690?}, 0xc000c5ba38?)
        /usr/local/go/src/net/http/server.go:2084 +0x2f
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1({0x1f2ee78?, 0xc0001880e0?}, 0xc00066f100)
        /go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:117 +0xaa
net/http.HandlerFunc.ServeHTTP(0x2d0b040?, {0x1f2ee78?, 0xc0001880e0?}, 0xc000c5b9c0?)
        /usr/local/go/src/net/http/server.go:2084 +0x2f
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2({0x1f2ee78, 0xc0001880e0}, 0xc00066f100)
        /go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:84 +0xbf
net/http.HandlerFunc.ServeHTTP(0x7573f31c1d?, {0x1f2ee78?, 0xc0001880e0?}, 0xc000386070?)
        /usr/local/go/src/net/http/server.go:2084 +0x2f
net/http.(*ServeMux).ServeHTTP(0xc00067a33f?, {0x1f2ee78, 0xc0001880e0}, 0xc00066f100)
        /usr/local/go/src/net/http/server.go:2462 +0x149
net/http.serverHandler.ServeHTTP({0x1f21638?}, {0x1f2ee78, 0xc0001880e0}, 0xc00066f100)
        /usr/local/go/src/net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc000782280, {0x1f2fca8, 0xc00084bbc0})
        /usr/local/go/src/net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
        /usr/local/go/src/net/http/server.go:3071 +0x4db

btw no matter with cluster or after removing cluster

eugene-chernyshenko avatar May 15 '23 22:05 eugene-chernyshenko

I just tried these steps, and it seems to work:

    Identify the machine to delete viakubectl get machines
    Annotate the machine via kubectl annotate machine machine-name "cluster.x-k8s.io/delete-machine"="yes"
    Scale down the KCP/MD via kubectl scale md md-name --replicas=1

haiwu avatar Jul 31 '23 15:07 haiwu