linstor-server icon indicating copy to clipboard operation
linstor-server copied to clipboard

Cant evacuate failed node

Open kvaps opened this issue 3 years ago • 0 comments

# linstor r l 
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName                             ┊ Node           ┊ Port ┊ Usage  ┊ Conns ┊    State ┊ CreatedOn           ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-0a73b994-416e-4889-84b8-173cc403633d ┊ hf-kubevirt-01 ┊ 7001 ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2022-07-28 09:15:33 ┊
┊ pvc-2e9c2021-197b-433e-9da2-6cec2926865b ┊ hf-kubevirt-01 ┊ 7003 ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2022-07-28 09:16:33 ┊
┊ pvc-2e9c2021-197b-433e-9da2-6cec2926865b ┊ hf-kubevirt-03 ┊ 7003 ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2022-07-29 07:28:15 ┊
┊ pvc-7deb78a4-265b-4f1f-904f-b3c7b6009085 ┊ hf-kubevirt-01 ┊ 7000 ┊ InUse  ┊ Ok    ┊ UpToDate ┊ 2022-07-28 09:07:24 ┊
┊ pvc-7deb78a4-265b-4f1f-904f-b3c7b6009085 ┊ hf-kubevirt-03 ┊ 7000 ┊ Unused ┊ Ok    ┊ Diskless ┊ 2022-07-29 07:28:15 ┊
┊ pvc-ade32295-dbce-43c6-8078-06e9be10985c ┊ hf-kubevirt-01 ┊ 7002 ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2022-07-28 09:15:58 ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

then I switched down nvme devices on node hf-kubevirt-03:

root@hf-kubevirt-03 / # echo 1 > /sys/bus/pci/devices/0000\:09\:00.0/remove
root@hf-kubevirt-03 / # echo 1 > /sys/bus/pci/devices/0000\:0a\:00.0/remove

Device become to diskless mode:

# linstor r l
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName                             ┊ Node           ┊ Port ┊ Usage  ┊ Conns ┊    State ┊ CreatedOn           ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-0a73b994-416e-4889-84b8-173cc403633d ┊ hf-kubevirt-01 ┊ 7001 ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2022-07-28 09:15:33 ┊
┊ pvc-2e9c2021-197b-433e-9da2-6cec2926865b ┊ hf-kubevirt-01 ┊ 7003 ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2022-07-28 09:16:33 ┊
┊ pvc-2e9c2021-197b-433e-9da2-6cec2926865b ┊ hf-kubevirt-03 ┊ 7003 ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2022-07-29 07:28:15 ┊
┊ pvc-7deb78a4-265b-4f1f-904f-b3c7b6009085 ┊ hf-kubevirt-01 ┊ 7000 ┊ InUse  ┊ Ok    ┊ UpToDate ┊ 2022-07-28 09:07:24 ┊
┊ pvc-7deb78a4-265b-4f1f-904f-b3c7b6009085 ┊ hf-kubevirt-03 ┊ 7000 ┊ Unused ┊ Ok    ┊ Diskless ┊ 2022-07-29 07:28:15 ┊
┊ pvc-ade32295-dbce-43c6-8078-06e9be10985c ┊ hf-kubevirt-01 ┊ 7002 ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2022-07-28 09:15:58 ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Storage pool list throws an error:

# linstor sp l
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool          ┊ Node           ┊ Driver   ┊ PoolName              ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊ SharedName ┊
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ hf-kubevirt-01 ┊ DISKLESS ┊                       ┊              ┊               ┊ False        ┊ Ok    ┊            ┊
┊ DfltDisklessStorPool ┊ hf-kubevirt-02 ┊ DISKLESS ┊                       ┊              ┊               ┊ False        ┊ Ok    ┊            ┊
┊ DfltDisklessStorPool ┊ hf-kubevirt-03 ┊ DISKLESS ┊                       ┊              ┊               ┊ False        ┊ Ok    ┊            ┊
┊ ssd-lvm              ┊ hf-kubevirt-01 ┊ LVM      ┊ linstor_data          ┊     1.69 TiB ┊      3.49 TiB ┊ False        ┊ Ok    ┊            ┊
┊ ssd-lvm              ┊ hf-kubevirt-02 ┊ LVM      ┊ linstor_data          ┊     1.69 TiB ┊      3.49 TiB ┊ False        ┊ Ok    ┊            ┊
┊ ssd-lvm              ┊ hf-kubevirt-03 ┊ LVM      ┊ linstor_data          ┊     1.69 TiB ┊      3.49 TiB ┊ False        ┊ Ok    ┊            ┊
┊ ssd-lvmthin          ┊ hf-kubevirt-01 ┊ LVM_THIN ┊ linstor_data/thindata ┊     1.80 TiB ┊      1.80 TiB ┊ True         ┊ Ok    ┊            ┊
┊ ssd-lvmthin          ┊ hf-kubevirt-02 ┊ LVM_THIN ┊ linstor_data/thindata ┊     1.80 TiB ┊      1.80 TiB ┊ True         ┊ Ok    ┊            ┊
┊ ssd-lvmthin          ┊ hf-kubevirt-03 ┊ LVM_THIN ┊ linstor_data/thindata ┊        0 KiB ┊         0 KiB ┊ True         ┊ Error ┊            ┊
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
ERROR:
Description:
    Node: 'hf-kubevirt-03', storage pool: 'ssd-lvmthin' - Failed to query free space from storage pool
Cause:
    Volume group 'linstor_data' not found

let's try to delete failed storage-pool:

# linstor sp d hf-kubevirt-03 ssd-lvmthin
ERROR:
Description:
    The specified storage pool 'ssd-lvmthin' on node 'hf-kubevirt-03' can not be deleted as volumes / snapshot-volumes are still using it.
Correction:
    Delete the listed volumes and snapshot-volumes first.
Details:
    Volumes / snapshot-volumes that are still using the storage pool:
       Node name: 'hf-kubevirt-03', resource name: 'pvc-2e9c2021-197b-433e-9da2-6cec2926865b', volume number: 0
       Node name: 'hf-kubevirt-03', resource name: 'pvc-7deb78a4-265b-4f1f-904f-b3c7b6009085', volume number: 0
    Node: hf-kubevirt-03, Storage pool name: ssd-lvmthin
Show reports:
    linstor error-reports show 62E228CF-00000-000002

Ok, got it, then let's try to evacuate node:

# linstor n evacuate hf-kubevirt-03
INFO:
    Resource-definition property 'DrbdOptions/Resource/quorum' was removed as there are not enough resources for quorum
INFO:
    Resource-definition property 'DrbdOptions/Resource/on-no-quorum' was removed as there are not enough resources for quorum
SUCCESS:
Description:
    Node: hf-kubevirt-03, Resource: pvc-7deb78a4-265b-4f1f-904f-b3c7b6009085 preparing for deletion.
Details:
    Node: hf-kubevirt-03, Resource: pvc-7deb78a4-265b-4f1f-904f-b3c7b6009085 UUID is: e3a7868f-deb2-4883-97ba-02be93982a20
SUCCESS:
    Successfully set property key(s): StorPoolName
SUCCESS:
Description:
    New resource 'pvc-2e9c2021-197b-433e-9da2-6cec2926865b' on node 'hf-kubevirt-02' registered.
Details:
    Resource 'pvc-2e9c2021-197b-433e-9da2-6cec2926865b' on node 'hf-kubevirt-02' UUID is: 23522f75-a8a9-4698-a92e-2730c04f03d9
SUCCESS:
Description:
    Volume with number '0' on resource 'pvc-2e9c2021-197b-433e-9da2-6cec2926865b' on node 'hf-kubevirt-02' successfully registered
Details:
    Volume UUID is: 21952b34-9d59-4d16-8a40-6e5d4230a032
SUCCESS:
    Preparing deletion of resource on 'hf-kubevirt-01'
ERROR:
Description:
    (Node: 'hf-kubevirt-03') Failed to create lvm volume
Details:
    Command 'lvcreate --virtualsize 8390440k linstor_data --thinpool thindata --name pvc-7deb78a4-265b-4f1f-904f-b3c7b6009085_00000' returned with exitcode 5.

    Standard out:


    Error message:
      Volume group "linstor_data" not found
      Cannot process volume group linstor_data

Show reports:
    linstor error-reports show 62E22CFB-AC349-000006
ERROR:
Description:
    Deletion of resource 'pvc-7deb78a4-265b-4f1f-904f-b3c7b6009085' on node 'hf-kubevirt-03' failed due to an unknown exception.
Details:
    Node: hf-kubevirt-03, Resource: pvc-7deb78a4-265b-4f1f-904f-b3c7b6009085
Show reports:
    linstor error-reports show 62E228CF-00000-000003
SUCCESS:
    Added peer(s) 'hf-kubevirt-02' to resource 'pvc-2e9c2021-197b-433e-9da2-6cec2926865b' on 'hf-kubevirt-01'
SUCCESS:
    Created resource 'pvc-2e9c2021-197b-433e-9da2-6cec2926865b' on 'hf-kubevirt-02'
ERROR:
Description:
    (Node: 'hf-kubevirt-03') Failed to create lvm volume
Details:
    Command 'lvcreate --virtualsize 1048840k linstor_data --thinpool thindata --name pvc-2e9c2021-197b-433e-9da2-6cec2926865b_00000' returned with exitcode 5.

    Standard out:


    Error message:
      Volume group "linstor_data" not found
      Cannot process volume group linstor_data

Show reports:
    linstor error-reports show 62E22CFB-AC349-000008

kvaps avatar Jul 29 '22 08:07 kvaps