piraeus-operator
piraeus-operator copied to clipboard
[v2]: How to deal with node deletion/eviction
Currently, the implicit behaviour of the Operator is:
If a LinstorSatellite resource should no longer exist, because it's no longer matching the node labels, etc... delete it. If a LinstorSatellite resource is deleted, it will be "finalized" by triggering node evacuation and either waiting for the node to be offline and then "lost", or if the node remains online: for all resources to be moved to other nodes.
This has already caused some pain to some users that were not expecting this behaviour. In particular, it makes it hard to "undelete" a satellite should that be desired.
I can't delete a linstorsatellite at all, even if I shrink the satelliteset... Were you able to delete one? Those have a piraeus.io/satellite-protection finalizer, I could remove those manually, but what's the best way to delete the unused ones?
Deleting the finalizer will remove them from the operator memory. You would then need to manually run linstor node lost <nodename> if the node still exists in LINSTOR to get them removed.
They keep reappearing even after deleting the satellite and "node lost"
What's the normal way to delete a node/satellite/pod/etc?
It will appear again, as long as:
- The node exists in Kubernetes (
kubectl get nodes) - The node is not excluded by the
spec.nodeSelectoron theLinstorClusterresource.
Are these "and" or "or"? The nodes are now excluded, but still exist in the cluster
Currently also being affected by this issue.
+1 on this issue.
What does the LinstorCluster resource look like? Does it have a spec.nodeSelector set?
Then, can you show the labels on one of the affected nodes?
I did not have the nodeSelector in linstorcluster, only in satellitesets. Adding it to cluster and restarting all controllers seemed to help, thanks!
I did not have the nodeSelector in linstorcluster, only in satellitesets. Adding it to cluster and restarting all controllers seemed to help, thanks!
@dimm0 can you share config please?
apiVersion: piraeus.io/v1
kind: LinstorCluster
metadata:
creationTimestamp: "2023-05-31T23:46:16Z"
generation: 6
name: linstorcluster
resourceVersion: "6323433601"
uid: d1eac6b0-150c-4068-95fc-f21dbff1dabd
spec:
nodeSelector:
nautilus.io/linstor: "true"
patches:
- patch: |-
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: linstor-csi-node
spec:
template:
spec:
nodeSelector:
nautilus.io/linstor: "true"
tolerations:
- effect: PreferNoSchedule
operator: Exists
- effect: NoSchedule
key: nautilus.io/gpu-bad
operator: Exists
- effect: NoSchedule
key: nautilus.io/large-gpu
operator: Exists
- effect: NoSchedule
key: nautilus.io/testing
operator: Exists
- effect: NoSchedule
key: nautilus.io/sdsu-fix
operator: Exists
- effect: NoSchedule
key: nautilus.io/stashcache
operator: Exists
- effect: NoSchedule
key: nautilus.io/sdsu-fix
operator: Exists
- effect: NoSchedule
key: nautilus.io/nrp-testing
operator: Exists
- effect: NoSchedule
key: nautilus.io/haosu
operator: Exists
- effect: NoSchedule
key: nautilus.io/ceph
operator: Exists
- effect: NoSchedule
key: nautilus.io/science-dmz
operator: Exists
- effect: NoSchedule
key: nautilus.io/noceph
operator: Exists
- effect: NoSchedule
key: drbd.linbit.com/lost-quorum
- effect: NoSchedule
key: drbd.linbit.com/force-io-error
target:
kind: DaemonSet
name: linstor-csi-node
- patch: |-
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: ha-controller
spec:
template:
spec:
nodeSelector:
nautilus.io/linstor: "true"
tolerations:
- effect: PreferNoSchedule
operator: Exists
- effect: NoSchedule
key: nautilus.io/gpu-bad
operator: Exists
- effect: NoSchedule
key: nautilus.io/large-gpu
operator: Exists
- effect: NoSchedule
key: nautilus.io/testing
operator: Exists
- effect: NoSchedule
key: nautilus.io/sdsu-fix
operator: Exists
- effect: NoSchedule
key: nautilus.io/stashcache
operator: Exists
- effect: NoSchedule
key: nautilus.io/nrp-testing
operator: Exists
- effect: NoSchedule
key: nautilus.io/haosu
operator: Exists
- effect: NoSchedule
key: nautilus.io/ceph
operator: Exists
- effect: NoSchedule
key: nautilus.io/science-dmz
operator: Exists
- effect: NoSchedule
key: nautilus.io/noceph
operator: Exists
- effect: NoSchedule
key: drbd.linbit.com/lost-quorum
- effect: NoSchedule
key: drbd.linbit.com/force-io-error
target:
kind: DaemonSet
name: ha-controller
status:
conditions:
- lastTransitionTime: "2023-08-30T04:57:04Z"
message: Resources applied
observedGeneration: 6
reason: AsExpected
status: "True"
type: Applied
- lastTransitionTime: "2023-08-30T04:40:58Z"
message: 'Controller 1.24.1 (API: 1.20.1, Git: f7d71e7c5416c84d6cdb05f9b490d9b1e81622e2)
reachable at ''http://linstor-controller.piraeus-datastore.svc:3370'''
observedGeneration: 6
reason: AsExpected
status: "True"
type: Available
- lastTransitionTime: "2023-08-30T05:42:29Z"
message: Properties applied
observedGeneration: 6
reason: AsExpected
status: "True"
type: Configured
The nodes have nautilus.io/linstor: "true" label set
The nodes have
nautilus.io/linstor: "true"label set
@dimm0 can you do it with nodes which does not have "node-role.kubernetes.io/control-plane" label?
Do you want to exclude the master? I don't think you can do it... nodeSelector doesn't have negative select. Basically you have to label all nodes except that one. Not too convenient...
Do you want to exclude the master? I don't think you can do it... nodeSelector doesn't have negative select. Basically you have to label all nodes except that one. Not too convenient...
Yes, that's what I want. I want to exclude all masters, but I don't understand how to do it
Not possible with the current selector. If it was supporting nodeAffinity, you'd be able to exclude..
If it was supporting nodeAffinity, you'd be able to exclude..
Please open a feature request, should not be too hard to implement :)
I did not have the nodeSelector in linstorcluster, only in satellitesets. Adding it to cluster and restarting all controllers seemed to help, thanks!
Ok, so now you do not have any unexpected satellites anymore?
Please open a feature request, should not be too hard to implement :)
Will do!
Ok, so now you do not have any unexpected satellites anymore?
Finally!!! :)
Please open a feature request, should not be too hard to implement :)
Will do! It`s done!
BTW, I don't fully understand how it currently creates the nodes. If it can't schedule a pod for some reason (it stays pending), but it matches nodeSelector, it will be a node in linstor. Does that mean you implement the label selection yourself? Won't it be better to get the list of running pods from kubernetes and use those?
I'm just thinking if you add the nodeAffinity, you'll have to rely on k8s anyways.
Yes, it's currently a (bad) reimplementation of the Kubernetes scheduler.
The reason is: we need to support every node having a slightly different Pod spec for the satellite. That is because the Pod spec for a satellite might be different based on:
- The host OS (different DRBD loader).
- Different storage pool configuration (The file pools may need different host path volumes).
- Perhaps different overrides, such as user-applied patches.
All of these are features we think are useful, but makes it hard to use a normal DaemonSet. So we need to use raw pods instead. Perhaps we can somehow reuse the logic of the kube-controller-manager when creating the DaemonSet Pods, I need to look into that.
Question: How to modify linstor resource definitions to remove nonexistent node from them and finalize deletion of the node when the node was already deleted from kubernetes?
Only Diskless resourcess were on that node and were already set on other node as Diskless, but somehow stayed defined also there and I didn't realize to double check and just deleted the node from kubernetes.
I have issued linstor n d api001 afterwards and linstor n l is showing node's state as DELETE.
One example of such resource (linstor r l):
┊ pvc-8cd6c577-3a93-4e2b-a13c-a2ab175e76be ┊ api001 ┊ 7039 ┊ ┊ Ok ┊ DELETING ┊ 2024-11-06 19:43:42 ┊
┊ pvc-8cd6c577-3a93-4e2b-a13c-a2ab175e76be ┊ node001 ┊ 7039 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-11-06 19:43:40 ┊
┊ pvc-8cd6c577-3a93-4e2b-a13c-a2ab175e76be ┊ node002 ┊ 7039 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-11-06 19:43:43 ┊
┊ pvc-8cd6c577-3a93-4e2b-a13c-a2ab175e76be ┊ redis002 ┊ 7039 ┊ InUse ┊ Ok ┊ Diskless ┊ 2024-11-06 20:38:11 ┊
Part of the error reported:
Error context:
Deletion of resource 'pvc-8cd6c577-3a93-4e2b-a13c-a2ab175e76be' on node 'api001' failed due to an unknown exception.
Asynchronous stage backtrace:
No connection to satellite 'api001'
Error has been observed at the following site(s):
*__checkpoint ? Prepare resource delete
*__checkpoint ? Activating resource if necessary before deletion
Sorry, for adding question here, but it seems to me like correct issue for the problem I am facing now.
Try running a linstor node lost .... that should cause LINSTOR to forcefully remove the node from all DRBD configurations.
Thank you very much, it resolved the issue.