vcluster icon indicating copy to clipboard operation
vcluster copied to clipboard

vcluster and virtual kubelet does not work together

Open antoinetran opened this issue 1 year ago • 2 comments

What happened?

When deploying kind, then vcluster, then interlink (which deploys a virtual kubelet), the virtual node from interlink is deleted by vcluster.

What did you expect to happen?

The virtual node appears when doing kubectl get node.

How can we reproduce it (as minimally and precisely as possible)?

  • deploy kind 0.19.0
  • deploy vcluster 0.18.1 or 0.19.4
  • deploy interlink core component (see https://intertwin-eu.github.io/interLink/docs/tutorial-admins/deploy-interlink)

Anything else we need to know?

See https://github.com/interTwin-eu/interLink/issues/260

This error pattern appears in vcluster logs:

delete virtual node my-vk-node, because it is not needed anymore

Which means vcluster deletes it. The code from vcluster is at https://github.com/loft-sh/vcluster/blob/v0.19.4/pkg/controllers/resources/nodes/syncer.go#L271 , at SyncToHost . I think vcluster thinks of the virtual kubelet as one of its own virtual node, and deletes it because it does not recognize it.

Host cluster Kubernetes version

$ kubectl version
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.0

vcluster version

0.18.1 and 0.19.4

VCluster Config

# My vcluster.yaml / values.yaml here

antoinetran avatar Jul 17 '24 16:07 antoinetran

Okay, after digging the code, I think this is what happen. When virtual kubelet creates a virtual node, though vcluster API, a virtual node is created in vcluster persistence (etcd). Then the vcluster reconcile function, on node creation event, is called. It checks if the virtual node (vcluster etcd) matches a physical node (known in the original kubernetes etcd) here. Since there is no physical node related to the virtual kubelet, vcluster deletes it, calling here, thus the log:

delete virtual node %s, because it is not needed anymore

A second problem, not happening for now, is that vcluster does not keep the node if it does not contains any pod (see code here). Which might be the case for the virtual kubelet in the beginning.

My recommendation: When a virtual node is created though vcluster API, but NOT though the host kubernetes API, then this node should be marked as not being managed by vcluster, neither for the test of whether the counterpart physical node exist, nor if the virtual node contains any pod. Its lifecycle should not be managed by vcluster.

antoinetran avatar Jul 18 '24 08:07 antoinetran

Another possible fix, but a bit more ugly, is to have a vcluster configuration in helm, to let it not manage any virtual node, related to a regex pattern in vcluster HELM values.yaml.

antoinetran avatar Jul 18 '24 08:07 antoinetran

This commit https://github.com/loft-sh/vcluster/commit/28c7b75dacb0642914ceee175d48bfea50d6b8cb added a way to handle the node sync, so that only vcluster nodes labelled by itself can be deleted:

Delete nodes which have managed-by label, ignore other nodes

=> http://github.com/loft-sh/vcluster/blob/28c7b75dacb0642914ceee175d48bfea50d6b8cb/pkg/controllers/resources/csinodes/syncer.go#L101

	// Set the marker of managed-by vcluster so that
	// we skip deleting the nodes which are not managed
	// by vcluster in `SyncToHost` function

So it seems this is fixed (in 2025-02-18). I will need to validate.

antoinetran avatar Aug 12 '25 16:08 antoinetran

The fix is in vcluster >= 0.23, but after test with vcluster 0.25.0, I still reproduce the issue. Syncer logs:

2025-08-13 15:34:25␉····^[[34mINFO^[[0m␉fake-node.interlink-slurm-node␉·syncer/syncer_fake.go:93␉·······Delete fake node interlink-slurm-node as it is not needed anymore␉······{"component": "vcluster"}

antoinetran avatar Aug 13 '25 15:08 antoinetran

Same issue with vcluster latest stable 0.27.0

antoinetran avatar Aug 13 '25 16:08 antoinetran

Ok I think the fix made by @kale-amruta kind of worked, because now the delete node is done by the fake_syncer.go and not by syncer.go (of nodes). So I will try to copy the same fix and see if we can avoid a node delete when it is not managed by vcluster.

antoinetran avatar Aug 14 '25 11:08 antoinetran

So I fixed the deletion part: now vcluster does not remove the fake node that InterLink (Virtual Kubelet) added. The vcluster logs (with grep fake):

2025-08-14 14:08:41     INFO    fake-node.2-rcr6j-worker-0-l5c86        syncer/syncer_fake.go:84        Create fake node 2-rcr6j-worker-0-l5c86 {"component": "vcluster"}
2025-08-14 14:09:13     INFO    fake-node.2-rcr6j-worker-0-8k65q        syncer/syncer_fake.go:84        Create fake node 2-rcr6j-worker-0-8k65q {"component": "vcluster"}
2025-08-14 14:09:42     INFO    fake-node.2-rcr6j-worker-0-jh6fh        syncer/syncer_fake.go:84        Create fake node 2-rcr6j-worker-0-jh6fh {"component": "vcluster"}
2025-08-14 14:09:55     INFO    fake-node.2-rcr6j-worker-0-jh6fh        syncer/syncer_fake.go:93        Delete fake node 2-rcr6j-worker-0-jh6fh as it is not needed anymore     {"component": "vcluster"}
2025-08-14 14:09:59     INFO    fake-node.2-rcr6j-worker-0-jh6fh        syncer/syncer_fake.go:84        Create fake node 2-rcr6j-worker-0-jh6fh {"component": "vcluster"}
2025-08-14 14:10:00     INFO    fake-persistentvolume.pvc-30522a26-791f-4064-a18a-f838a400655d  syncer/syncer_fake.go:84        Create fake persistent volume for PVC jupyter/hub-db-dir        {"component": "vcluster"}
2025-08-14 14:10:16     INFO    fake-node.2-rcr6j-worker-0-d5fj5        syncer/syncer_fake.go:84        Create fake node 2-rcr6j-worker-0-d5fj5 {"component": "vcluster"}
2025-08-14 14:22:54     INFO    fake-persistentvolume.pvc-8733cc9c-6a0b-4b85-896a-0b7bb22521e2  syncer/syncer_fake.go:84        Create fake persistent volume for PVC nextcloud/data-my-nextcloud-mariadb-0     {"component": "vcluster"
}
2025-08-14 14:22:54     INFO    fake-persistentvolume.pvc-335d9e9f-9d26-4146-b93a-5a00a753bebf  syncer/syncer_fake.go:84        Create fake persistent volume for PVC nextcloud/my-nextcloud-nextcloud  {"component": "vcluster"}
2025-08-14 14:47:10     INFO    fake-node.interlink-slurm-node  syncer/syncer_fake.go:93        Unmanaged fake node interlink-slurm-node, doing nothing.        {"component": "vcluster"}
2025-08-14 14:47:40     INFO    fake-node.interlink-slurm-node  syncer/syncer_fake.go:93        Unmanaged fake node interlink-slurm-node, doing nothing.        {"component": "vcluster"}
2025-08-14 14:48:10     INFO    fake-node.interlink-slurm-node  syncer/syncer_fake.go:93        Unmanaged fake node interlink-slurm-node, doing nothing.        {"component": "vcluster"}
2025-08-14 14:48:40     INFO    fake-node.interlink-slurm-node  syncer/syncer_fake.go:93        Unmanaged fake node interlink-slurm-node, doing nothing.        {"component": "vcluster"}

So the new node stays alive!

antoinetran avatar Aug 14 '25 15:08 antoinetran

However, I don't know if I should create another ticket, but the pod that I launch in vcluster does not know the new node. It is as if the new node never exist, maybe because this is indeed a "fake" node? I would need some help here.

antoinetran avatar Aug 14 '25 15:08 antoinetran

Ok I took a look at virtualScheduler and hybridScheduler doc, and code. I tried that but it didn't work, because it requires some extensive RBAC from OpenShift cluster, which I will never have.

The goal is to schedule pod on the virtual node interlink only when the soft/hard affinity to this node has been set in pods, and schedule the rest on host scheduler, without vcluster managing the host nodes as in virtualScheduler or hybridScheduler mode. But I will admit it is complicated for me to add it in code.

antoinetran avatar Aug 19 '25 10:08 antoinetran