cluster-api-provider-bringyourownhost
cluster-api-provider-bringyourownhost copied to clipboard
Remove host from inventory
I am trying to use byoh provider and I ran into an issue which I cannot delete an host which is not in use. I am trying to run uninstall of the agent on the host side , and to run delete on the management side and I get the following error:
the command I am trying to run :
- kubectl delete byohosts.infrastructure.cluster.x-k8s.io lab-preprod-k8s-worker-lab-5
the result:
- Error from server (Forbidden): admission webhook "vbyohost.kb.io" denied the request: byohost.infrastructure.cluster.x-k8s.io "lab-preprod-k8s-worker-lab-5" is forbidden: cannot delete ByoHost when MachineRef is assigned
I have also tried to edit the byoh host object , but I dont manage to commit any changes when I am deleting the machine ref.
Can you let me know what should I do?
- Kubernetes version: (use
kubectl version --short
): Client Version: v1.22.4 Server Version: v1.22.4 - OS: ubuntu 20.4
Hi @kerenbenzion, There is a delete webhook that restricts deleting a ByoHost
CR with a status.MachineRef
pointing to a ByoMachine
. The Host-Agent
should be running and then you can try to scale out MD
/KCP
.
Note that removing a specific ByoHost
/ByoMachine
from the workload cluster is not supported as of now.
@dharmjit if I understood correctly, currently we can do only scale up ? We cannot decrease number of hosts if machine ref has been assigned already ? Will it be supported?
currently we can do only scale up
We can do both scale up/scale down for both KCP
/MD
as per the CAPI
contract. you can use below to update the number of replicas
-
kubectl scale kcp|md
-
kubectl edit kcp|md
We cannot decrease number of hosts.
If I understand your question correctly, you can detach hosts from the cluster with scale-out operation and once you have identified the host which is removed, you can stop the Host-Agent
and delete the specific ByoHost
CR.
Hi,
I am trying to understand if lets say one of my hosts (worker/controlplane) have issues (physical/os ) and I need some time to fix it Can I remove the byoh host untill the problem is fixed?
Can I remove the byoh host untill the problem is fixed?
Removing a specific host is not supported as of now.
Do you know/if it will be supported?
Hey @kerenbenzion
Some context on why you are forbidden to delete a byohost directly. We want to avoid any accidental deletion of byohosts from the cluster which can disrupt your cluster state and workload scheduled on that node.
Just re-iterating what @dharmjit mentioned. For any maintenance activity on the host, you should first do a scale-down operation - as of now we can't choose a particular host while scaling down, so your best option now is to teardown the cluster so that all hosts will be released back to capacity pool or try to scale down (control plane / worker) and hope your host is the one being deprovisioned.
Since MachineRef
is set on the byohost.Status
field, I'm not quite sure if you can edit that - that is why you are not able to commit changes.
When doing a scale-down operation or cluster deprovision, you can check the agent logs and when you see something like MachineRef not set
- that is when you are free to issue the kubectl delete byohost
command.
Do you know/if it will be supported?
This is a valid use case and thank you for pointing it out. ~~I will open an issue regarding this (most probably after the holidays 😄 )~~
When the team is back we will look at prioritizing this issue and include in one of the releases.
Hi,
I am trying to understand if lets say one of my hosts (worker/controlplane) have issues (physical/os ) and I need some time to fix it Can I remove the byoh host untill the problem is fixed?
There is a way can do this, be careful, it is not recommend on your product environment. Delete entire byoh provider resource first, and then clean your cluster resource completely. It should solve your current dilemma.
Hi @kerenbenzion, We were not aware earlier but CAPI already provides a way to delete a specific Machine
. You could perform below steps.
- Identify the machine to delete via
kubectl get machines
- Annotate the machine via
kubectl annotate machine machine-name "cluster.x-k8s.io/delete-machine"="yes"
- Scale down the KCP/MD via
kubectl scale md md-name --replicas=1
Let us know if this works for you.
This might be an absolute hack. But I got around deleting stagnant byoHosts by temporarily removing the validatingwebhookconfiguration
for byoHosts
I had the similar issue with removing the byohost
k delete byoh byoh-1
Error from server (cannot delete ByoHost when MachineRef is assigned): admission webhook "vbyohost.kb.io" denied the request: cannot delete ByoHost when MachineRef is assigned
Then I tried to remove the machineRef
k edit byoh byoh-1
machineRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: ByoMachine
name: byoh-cluster-control-plane-jx95s
namespace: default
uid: f36c4978-089b-42a3-8d05-a4148f25b5a8
and got this error
error: byohosts.infrastructure.cluster.x-k8s.io "byoh-1" could not be patched: Internal error occurred: failed calling webhook "vbyohost.kb.io": failed to call webhook: Post "https://byoh-webhook-service.byoh-system.svc:443/validate-infrastructure-cluster-x-k8s-io-v1beta1-byohost?timeout=10s": EOF
k -n byoh-system logs -f byoh-controller-manager-6c4555b4dd-szlkp
2023/05/15 21:53:25 http: panic serving 10.42.0.1:48090: runtime error: index out of range [2] with length 2
goroutine 1881 [running]:
net/http.(*conn).serve.func1()
/usr/local/go/src/net/http/server.go:1825 +0xbf
panic({0x1ae6a20, 0xc000b4a558})
/usr/local/go/src/runtime/panic.go:844 +0x258
github.com/vmware-tanzu/cluster-api-provider-bringyourownhost/apis/infrastructure/v1beta1.(*ByoHostValidator).handleCreateUpdate(0xc00044fc48, 0xc000c5adc0)
/workspace/apis/infrastructure/v1beta1/byohost_webhook.go:59 +0x6d5
github.com/vmware-tanzu/cluster-api-provider-bringyourownhost/apis/infrastructure/v1beta1.(*ByoHostValidator).Handle(_, {_, _}, {{{0xc000aff7d0, 0x24}, {{0xc000acfc40, 0x1f}, {0xc0005fe338, 0x7}, {0xc0005fe390, ...}}, ...}})
/workspace/apis/infrastructure/v1beta1/byohost_webhook.go:35 +0x117
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle(_, {_, _}, {{{0xc000aff7d0, 0x24}, {{0xc000acfc40, 0x1f}, {0xc0005fe338, 0x7}, {0xc0005fe390, ...}}, ...}})
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/webhook/admission/webhook.go:146 +0xa2
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP(0xc00083b180, {0x7fe8940926b8?, 0xc00047c690}, 0xc00066f100)
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/webhook/admission/http.go:99 +0xe90
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerInFlight.func1({0x7fe8940926b8, 0xc00047c690}, 0x1f2ee00?)
/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:40 +0xd4
net/http.HandlerFunc.ServeHTTP(0x1f2ee78?, {0x7fe8940926b8?, 0xc00047c690?}, 0xc000c5ba38?)
/usr/local/go/src/net/http/server.go:2084 +0x2f
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1({0x1f2ee78?, 0xc0001880e0?}, 0xc00066f100)
/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:117 +0xaa
net/http.HandlerFunc.ServeHTTP(0x2d0b040?, {0x1f2ee78?, 0xc0001880e0?}, 0xc000c5b9c0?)
/usr/local/go/src/net/http/server.go:2084 +0x2f
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2({0x1f2ee78, 0xc0001880e0}, 0xc00066f100)
/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:84 +0xbf
net/http.HandlerFunc.ServeHTTP(0x7573f31c1d?, {0x1f2ee78?, 0xc0001880e0?}, 0xc000386070?)
/usr/local/go/src/net/http/server.go:2084 +0x2f
net/http.(*ServeMux).ServeHTTP(0xc00067a33f?, {0x1f2ee78, 0xc0001880e0}, 0xc00066f100)
/usr/local/go/src/net/http/server.go:2462 +0x149
net/http.serverHandler.ServeHTTP({0x1f21638?}, {0x1f2ee78, 0xc0001880e0}, 0xc00066f100)
/usr/local/go/src/net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc000782280, {0x1f2fca8, 0xc00084bbc0})
/usr/local/go/src/net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:3071 +0x4db
btw no matter with cluster or after removing cluster
I just tried these steps, and it seems to work:
Identify the machine to delete viakubectl get machines
Annotate the machine via kubectl annotate machine machine-name "cluster.x-k8s.io/delete-machine"="yes"
Scale down the KCP/MD via kubectl scale md md-name --replicas=1