piraeus-operator icon indicating copy to clipboard operation
piraeus-operator copied to clipboard

[v2]: How to deal with node deletion/eviction

Open WanzenBug opened this issue 2 years ago • 21 comments

Currently, the implicit behaviour of the Operator is:

If a LinstorSatellite resource should no longer exist, because it's no longer matching the node labels, etc... delete it. If a LinstorSatellite resource is deleted, it will be "finalized" by triggering node evacuation and either waiting for the node to be offline and then "lost", or if the node remains online: for all resources to be moved to other nodes.

This has already caused some pain to some users that were not expecting this behaviour. In particular, it makes it hard to "undelete" a satellite should that be desired.

WanzenBug avatar Aug 10 '23 06:08 WanzenBug

I can't delete a linstorsatellite at all, even if I shrink the satelliteset... Were you able to delete one? Those have a piraeus.io/satellite-protection finalizer, I could remove those manually, but what's the best way to delete the unused ones?

dimm0 avatar Aug 11 '23 17:08 dimm0

Deleting the finalizer will remove them from the operator memory. You would then need to manually run linstor node lost <nodename> if the node still exists in LINSTOR to get them removed.

WanzenBug avatar Aug 16 '23 08:08 WanzenBug

They keep reappearing even after deleting the satellite and "node lost"

What's the normal way to delete a node/satellite/pod/etc?

dimm0 avatar Aug 17 '23 23:08 dimm0

It will appear again, as long as:

  • The node exists in Kubernetes (kubectl get nodes)
  • The node is not excluded by the spec.nodeSelector on the LinstorCluster resource.

WanzenBug avatar Aug 18 '23 06:08 WanzenBug

Are these "and" or "or"? The nodes are now excluded, but still exist in the cluster

dimm0 avatar Aug 18 '23 07:08 dimm0

Currently also being affected by this issue.

kkrick-sdsu avatar Aug 18 '23 15:08 kkrick-sdsu

+1 on this issue.

mfarley-sdsu avatar Aug 18 '23 16:08 mfarley-sdsu

What does the LinstorCluster resource look like? Does it have a spec.nodeSelector set?

Then, can you show the labels on one of the affected nodes?

WanzenBug avatar Aug 29 '23 12:08 WanzenBug

I did not have the nodeSelector in linstorcluster, only in satellitesets. Adding it to cluster and restarting all controllers seemed to help, thanks!

dimm0 avatar Aug 30 '23 04:08 dimm0

I did not have the nodeSelector in linstorcluster, only in satellitesets. Adding it to cluster and restarting all controllers seemed to help, thanks!

@dimm0 can you share config please?

online01993 avatar Aug 30 '23 05:08 online01993

apiVersion: piraeus.io/v1
kind: LinstorCluster
metadata:
  creationTimestamp: "2023-05-31T23:46:16Z"
  generation: 6
  name: linstorcluster
  resourceVersion: "6323433601"
  uid: d1eac6b0-150c-4068-95fc-f21dbff1dabd
spec:
  nodeSelector:
    nautilus.io/linstor: "true"
  patches:
  - patch: |-
      apiVersion: apps/v1
      kind: DaemonSet
      metadata:
        name: linstor-csi-node
      spec:
        template:
          spec:
            nodeSelector:
              nautilus.io/linstor: "true"
            tolerations:
            - effect: PreferNoSchedule
              operator: Exists
            - effect: NoSchedule
              key: nautilus.io/gpu-bad
              operator: Exists
            - effect: NoSchedule
              key: nautilus.io/large-gpu
              operator: Exists
            - effect: NoSchedule
              key: nautilus.io/testing
              operator: Exists
            - effect: NoSchedule
              key: nautilus.io/sdsu-fix
              operator: Exists
            - effect: NoSchedule
              key: nautilus.io/stashcache
              operator: Exists
            - effect: NoSchedule
              key: nautilus.io/sdsu-fix
              operator: Exists
            - effect: NoSchedule
              key: nautilus.io/nrp-testing
              operator: Exists
            - effect: NoSchedule
              key: nautilus.io/haosu
              operator: Exists
            - effect: NoSchedule
              key: nautilus.io/ceph
              operator: Exists
            - effect: NoSchedule
              key: nautilus.io/science-dmz
              operator: Exists
            - effect: NoSchedule
              key: nautilus.io/noceph
              operator: Exists
            - effect: NoSchedule
              key: drbd.linbit.com/lost-quorum
            - effect: NoSchedule
              key: drbd.linbit.com/force-io-error
    target:
      kind: DaemonSet
      name: linstor-csi-node
  - patch: |-
      apiVersion: apps/v1
      kind: DaemonSet
      metadata:
        name: ha-controller
      spec:
        template:
          spec:
            nodeSelector:
              nautilus.io/linstor: "true"
            tolerations:
            - effect: PreferNoSchedule
              operator: Exists
            - effect: NoSchedule
              key: nautilus.io/gpu-bad
              operator: Exists
            - effect: NoSchedule
              key: nautilus.io/large-gpu
              operator: Exists
            - effect: NoSchedule
              key: nautilus.io/testing
              operator: Exists
            - effect: NoSchedule
              key: nautilus.io/sdsu-fix
              operator: Exists
            - effect: NoSchedule
              key: nautilus.io/stashcache
              operator: Exists
            - effect: NoSchedule
              key: nautilus.io/nrp-testing
              operator: Exists
            - effect: NoSchedule
              key: nautilus.io/haosu
              operator: Exists
            - effect: NoSchedule
              key: nautilus.io/ceph
              operator: Exists
            - effect: NoSchedule
              key: nautilus.io/science-dmz
              operator: Exists
            - effect: NoSchedule
              key: nautilus.io/noceph
              operator: Exists
            - effect: NoSchedule
              key: drbd.linbit.com/lost-quorum
            - effect: NoSchedule
              key: drbd.linbit.com/force-io-error
    target:
      kind: DaemonSet
      name: ha-controller
status:
  conditions:
  - lastTransitionTime: "2023-08-30T04:57:04Z"
    message: Resources applied
    observedGeneration: 6
    reason: AsExpected
    status: "True"
    type: Applied
  - lastTransitionTime: "2023-08-30T04:40:58Z"
    message: 'Controller 1.24.1 (API: 1.20.1, Git: f7d71e7c5416c84d6cdb05f9b490d9b1e81622e2)
      reachable at ''http://linstor-controller.piraeus-datastore.svc:3370'''
    observedGeneration: 6
    reason: AsExpected
    status: "True"
    type: Available
  - lastTransitionTime: "2023-08-30T05:42:29Z"
    message: Properties applied
    observedGeneration: 6
    reason: AsExpected
    status: "True"
    type: Configured

dimm0 avatar Aug 30 '23 05:08 dimm0

The nodes have nautilus.io/linstor: "true" label set

dimm0 avatar Aug 30 '23 06:08 dimm0

The nodes have nautilus.io/linstor: "true" label set

@dimm0 can you do it with nodes which does not have "node-role.kubernetes.io/control-plane" label?

online01993 avatar Aug 30 '23 06:08 online01993

Do you want to exclude the master? I don't think you can do it... nodeSelector doesn't have negative select. Basically you have to label all nodes except that one. Not too convenient...

dimm0 avatar Aug 30 '23 06:08 dimm0

Do you want to exclude the master? I don't think you can do it... nodeSelector doesn't have negative select. Basically you have to label all nodes except that one. Not too convenient...

Yes, that's what I want. I want to exclude all masters, but I don't understand how to do it

online01993 avatar Aug 30 '23 06:08 online01993

Not possible with the current selector. If it was supporting nodeAffinity, you'd be able to exclude..

dimm0 avatar Aug 30 '23 06:08 dimm0

If it was supporting nodeAffinity, you'd be able to exclude..

Please open a feature request, should not be too hard to implement :)

I did not have the nodeSelector in linstorcluster, only in satellitesets. Adding it to cluster and restarting all controllers seemed to help, thanks!

Ok, so now you do not have any unexpected satellites anymore?

WanzenBug avatar Aug 30 '23 06:08 WanzenBug

Please open a feature request, should not be too hard to implement :)

Will do!

Ok, so now you do not have any unexpected satellites anymore?

Finally!!! :)

dimm0 avatar Aug 30 '23 06:08 dimm0

Please open a feature request, should not be too hard to implement :)

Will do! It`s done!

online01993 avatar Aug 30 '23 06:08 online01993

BTW, I don't fully understand how it currently creates the nodes. If it can't schedule a pod for some reason (it stays pending), but it matches nodeSelector, it will be a node in linstor. Does that mean you implement the label selection yourself? Won't it be better to get the list of running pods from kubernetes and use those?

I'm just thinking if you add the nodeAffinity, you'll have to rely on k8s anyways.

dimm0 avatar Aug 30 '23 06:08 dimm0

Yes, it's currently a (bad) reimplementation of the Kubernetes scheduler.

The reason is: we need to support every node having a slightly different Pod spec for the satellite. That is because the Pod spec for a satellite might be different based on:

  • The host OS (different DRBD loader).
  • Different storage pool configuration (The file pools may need different host path volumes).
  • Perhaps different overrides, such as user-applied patches.

All of these are features we think are useful, but makes it hard to use a normal DaemonSet. So we need to use raw pods instead. Perhaps we can somehow reuse the logic of the kube-controller-manager when creating the DaemonSet Pods, I need to look into that.

WanzenBug avatar Aug 30 '23 07:08 WanzenBug

Question: How to modify linstor resource definitions to remove nonexistent node from them and finalize deletion of the node when the node was already deleted from kubernetes?

Only Diskless resourcess were on that node and were already set on other node as Diskless, but somehow stayed defined also there and I didn't realize to double check and just deleted the node from kubernetes.

I have issued linstor n d api001 afterwards and linstor n l is showing node's state as DELETE.

One example of such resource (linstor r l):

┊ pvc-8cd6c577-3a93-4e2b-a13c-a2ab175e76be ┊ api001   ┊ 7039 ┊        ┊ Ok    ┊   DELETING ┊ 2024-11-06 19:43:42 ┊
┊ pvc-8cd6c577-3a93-4e2b-a13c-a2ab175e76be ┊ node001  ┊ 7039 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2024-11-06 19:43:40 ┊
┊ pvc-8cd6c577-3a93-4e2b-a13c-a2ab175e76be ┊ node002  ┊ 7039 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2024-11-06 19:43:43 ┊
┊ pvc-8cd6c577-3a93-4e2b-a13c-a2ab175e76be ┊ redis002 ┊ 7039 ┊ InUse  ┊ Ok    ┊   Diskless ┊ 2024-11-06 20:38:11 ┊

Part of the error reported:

Error context:
        Deletion of resource 'pvc-8cd6c577-3a93-4e2b-a13c-a2ab175e76be' on node 'api001' failed due to an unknown exception.
Asynchronous stage backtrace:
        No connection to satellite 'api001'
    
    Error has been observed at the following site(s):
        *__checkpoint ? Prepare resource delete
        *__checkpoint ? Activating resource if necessary before deletion

Sorry, for adding question here, but it seems to me like correct issue for the problem I am facing now.

RichardSufliarsky avatar Nov 13 '24 12:11 RichardSufliarsky

Try running a linstor node lost .... that should cause LINSTOR to forcefully remove the node from all DRBD configurations.

WanzenBug avatar Nov 13 '24 12:11 WanzenBug

Thank you very much, it resolved the issue.

RichardSufliarsky avatar Nov 13 '24 12:11 RichardSufliarsky