piraeus-operator Volume group not found error when using symlink devices

I think the storage preparation section in the docs (https://github.com/piraeusdatastore/piraeus-operator/blob/master/doc/storage.md#preparing-physical-devices) should warn against using symlink devices like the ones in /dev/disk/by-id/ or other /dev/disk/by-X folders.

The current 3 requirements should list a 4th one that the device must not be a symlink:

Are a root device (no partition)
do not contain partition information
have more than 1 GiB

Although this change could solve provisioning problems for others, the real solution would be to support persistent device names.

Details

I tried to avoid using /dev/sdX in my devicePaths list for preparing devices as these names are known to be not safe for long-term usage, and one should use persistent device names: https://wiki.archlinux.org/index.php/persistent_block_device_naming

When the operator was deployed with a persistent device name (/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1), all seemed fine, however, when I added a StorageClass and tried to provision a PVC using it, the PVC kept to be unbound. I checked the CSI logs, and found these lines:

csi-provisioner I1225 09:05:54.115790       1 controller.go:645] CreateVolume failed, supports topology = false, node selected false => may reschedule = false => state = Finished: rpc error: code = Internal desc = CreateVolume fail
ed for pvc-0797f634-b26b-4d82-b5e0-d35014deb438: Message: 'Not enough available nodes'; Details: 'Not enough nodes fulfilling the following auto-place criteria:
csi-provisioner  * has a deployed storage pool named TransactionList [thinpool]
csi-provisioner  * the storage pools have to have at least '5242880' free space
csi-provisioner  * the current access context has enough privileges to use the node and the storage pool
csi-provisioner  * the node is online
csi-provisioner Auto-place configuration details:
csi-provisioner   Additional place count: 3
csi-provisioner   Don't place with resource (List): [pvc-0797f634-b26b-4d82-b5e0-d35014deb438]
csi-provisioner   Storage pool name: TransactionList [thinpool]
csi-provisioner   Layer stack: [DRBD, STORAGE]
csi-provisioner Auto-placing resource: pvc-0797f634-b26b-4d82-b5e0-d35014deb438'
csi-provisioner I1225 09:05:54.115819       1 controller.go:1084] Final error received, removing PVC 0797f634-b26b-4d82-b5e0-d35014deb438 from claims in progress
csi-provisioner W1225 09:05:54.115827       1 controller.go:943] Retrying syncing claim "0797f634-b26b-4d82-b5e0-d35014deb438", failure 7

Started to dig deeper, and found these errors via linstor CLI:

$ linstor storage-pool l  
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool          ┊ Node  ┊ Driver   ┊ PoolName                  ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ node1 ┊ DISKLESS ┊                           ┊              ┊               ┊ False        ┊ Ok    ┊
┊ DfltDisklessStorPool ┊ node2 ┊ DISKLESS ┊                           ┊              ┊               ┊ False        ┊ Ok    ┊
┊ DfltDisklessStorPool ┊ node3 ┊ DISKLESS ┊                           ┊              ┊               ┊ False        ┊ Ok    ┊
┊ lvm-thin             ┊ node1 ┊ LVM_THIN ┊ linstor_thinpool/thinpool ┊        0 KiB ┊         0 KiB ┊ True         ┊ Error ┊
┊ lvm-thin             ┊ node2 ┊ LVM_THIN ┊ linstor_thinpool/thinpool ┊        0 KiB ┊         0 KiB ┊ True         ┊ Error ┊
┊ lvm-thin             ┊ node3 ┊ LVM_THIN ┊ linstor_thinpool/thinpool ┊        0 KiB ┊         0 KiB ┊ True         ┊ Error ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
ERROR:
Description:
    Node: 'node1', storage pool: 'lvm-thin' - Failed to query free space from storage pool
Cause:
    Volume group 'linstor_thinpool' not found
ERROR:
Description:
    Node: 'node2', storage pool: 'lvm-thin' - Failed to query free space from storage pool
Cause:
    Volume group 'linstor_thinpool' not found
ERROR:
Description:
    Node: 'node3', storage pool: 'lvm-think' - Failed to query free space from storage pool
Cause:
    Volume group 'linstor_thinpool' not found

The related error log:

$ cat /var/log/linstor-satellite/ErrorReport-5FE58E5C-1F7FF-000000.log 
ERROR REPORT 5FE58E5C-1F7FF-000000

============================================================

Application:                        LINBIT? LINSTOR
Module:                             Satellite
Version:                            1.11.0
Build ID:                           3367e32d0fa92515efe61f6963767700a8701d98
Build time:                         2020-12-18T08:40:35+00:00
Error time:                         2020-12-25 07:02:59
Node:                               node3

============================================================

Reported error:
===============

Description:
    Volume group 'linstor_thinpool' not found

Category:                           LinStorException
Class name:                         StorageException
Class canonical name:               com.linbit.linstor.storage.StorageException
Generated at:                       Method 'checkVgExists', Source file 'LvmUtils.java', Line #398

Error message:                      Volume group 'linstor_thinpool' not found

Call backtrace:

    Method                                   Native Class:Line number
    checkVgExists                            N      com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:398
    checkVolumeGroupEntry                    N      com.linbit.linstor.layer.storage.utils.StorageConfigReader:63
    checkConfig                              N      com.linbit.linstor.layer.storage.lvm.LvmProvider:549
    checkStorPool                            N      com.linbit.linstor.layer.storage.StorageLayer:396
    getSpaceInfo                             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:913
    getSpaceInfo                             N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1225
    getStoragePoolSpaceInfo                  N      com.linbit.linstor.core.apicallhandler.StltApiCallHandlerUtils:279
    applyChanges                             N      com.linbit.linstor.core.apicallhandler.StltStorPoolApiCallHandler:235
    applyFullSync                            N      com.linbit.linstor.core.apicallhandler.StltApiCallHandler:332
    execute                                  N      com.linbit.linstor.api.protobuf.FullSync:94
    executeNonReactive                       N      com.linbit.linstor.proto.CommonMessageProcessor:525
    lambda$execute$13                        N      com.linbit.linstor.proto.CommonMessageProcessor:500
    doInScope                                N      com.linbit.linstor.core.apicallhandler.ScopeRunner:147
    lambda$fluxInScope$0                     N      com.linbit.linstor.core.apicallhandler.ScopeRunner:75
    call                                     N      reactor.core.publisher.MonoCallable:91
    trySubscribeScalarMap                    N      reactor.core.publisher.FluxFlatMap:126
    subscribeOrReturn                        N      reactor.core.publisher.MonoFlatMapMany:49
    subscribe                                N      reactor.core.publisher.Flux:8343
    onNext                                   N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
    request                                  N      reactor.core.publisher.Operators$ScalarSubscription:2344
    onSubscribe                              N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:134
    subscribe                                N      reactor.core.publisher.MonoCurrentContext:35
    subscribe                                N      reactor.core.publisher.Flux:8357
    onNext                                   N      reactor.core.publisher.FluxFlatMap$FlatMapMain:418
    slowPath                                 N      reactor.core.publisher.FluxArray$ArraySubscription:126
    request                                  N      reactor.core.publisher.FluxArray$ArraySubscription:99
    onSubscribe                              N      reactor.core.publisher.FluxFlatMap$FlatMapMain:363
    subscribe                                N      reactor.core.publisher.FluxMerge:69
    subscribe                                N      reactor.core.publisher.Flux:8357
    onComplete                               N      reactor.core.publisher.FluxConcatArray$ConcatArraySubscriber:207
    subscribe                                N      reactor.core.publisher.FluxConcatArray:80
    subscribe                                N      reactor.core.publisher.InternalFluxOperator:62
    subscribe                                N      reactor.core.publisher.FluxDefer:54
    subscribe                                N      reactor.core.publisher.Flux:8357
    onNext                                   N      reactor.core.publisher.FluxFlatMap$FlatMapMain:418
    drainAsync                               N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
    drain                                    N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
    onNext                                   N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
    drainFused                               N      reactor.core.publisher.UnicastProcessor:286
    drain                                    N      reactor.core.publisher.UnicastProcessor:329
    onNext                                   N      reactor.core.publisher.UnicastProcessor:408
    next                                     N      reactor.core.publisher.FluxCreate$IgnoreSink:618
    next                                     N      reactor.core.publisher.FluxCreate$SerializedSink:153
    processInOrder                           N      com.linbit.linstor.netcom.TcpConnectorPeer:373
    doProcessMessage                         N      com.linbit.linstor.proto.CommonMessageProcessor:218
    lambda$processMessage$2                  N      com.linbit.linstor.proto.CommonMessageProcessor:164
    onNext                                   N      reactor.core.publisher.FluxPeek$PeekSubscriber:177
    runAsync                                 N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
    run                                      N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
    call                                     N      reactor.core.scheduler.WorkerTask:84
    call                                     N      reactor.core.scheduler.WorkerTask:37
    run                                      N      java.util.concurrent.FutureTask:264
    run                                      N      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
    runWorker                                N      java.util.concurrent.ThreadPoolExecutor:1128
    run                                      N      java.util.concurrent.ThreadPoolExecutor$Worker:628
    run                                      N      java.lang.Thread:834


END OF ERROR REPORT.

I also found this list, and started to wonder how these became sdb:

$ linstor physical-storage l
╭───────────────────────────────────────────╮
┊ Size       ┊ Rotational ┊ Nodes           ┊
╞═══════════════════════════════════════════╡
┊ 8589934592 ┊ True       ┊ node1[/dev/sdb] ┊
┊            ┊            ┊ node2[/dev/sdb] ┊
┊            ┊            ┊ node3[/dev/sdb] ┊
╰───────────────────────────────────────────╯

It seems that it followed the symlink to sdb:

$ ls -la /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1
lrwxrwxrwx. 1 root root 9 Dec 25 10:58 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1 -> ../../sdb

Just to make sure that the device is compatible with the 3 required points, I checked fdisk:

$ fdisk /dev/sdb

Welcome to fdisk (util-linux 2.32.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0xd9639fdb.

Command (m for help): m

Help:

  DOS (MBR)
   a   toggle a bootable flag
   b   edit nested BSD disklabel
   c   toggle the dos compatibility flag

  Generic
   d   delete a partition
   F   list free unpartitioned space
   l   list known partition types
   n   add a new partition
   p   print the partition table
   t   change a partition type
   v   verify the partition table
   i   print information about a partition

  Misc
   m   print this menu
   u   change display/entry units
   x   extra functionality (experts only)

  Script
   I   load disk layout from sfdisk script file
   O   dump disk layout to sfdisk script file

  Save & Exit
   w   write table to disk and exit
   q   quit without saving changes

  Create a new label
   g   create a new empty GPT partition table
   G   create a new empty SGI (IRIX) partition table
   o   create a new empty DOS partition table
   s   create a new empty Sun partition table


Command (m for help): F
Unpartitioned space /dev/sdb: 8 GiB, 8588886016 bytes, 16775168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes

Start      End  Sectors Size
 2048 16777215 16775168   8G

Command (m for help): p
Disk /dev/sdb: 8 GiB, 8589934592 bytes, 16777216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xd9639fdb

Command (m for help): v
Remaining 16777215 unallocated 512-byte sectors.

Command (m for help): i
No partition is defined yet!

Command (m for help): q

So I started to suspect that something is problematic with the persistent name being a symlink. Since I have no experience with Piraeus/Linstor device migration if it's possible or not, I removed everything from the cluster that is related to these, removed all etcd host path volumes, and cleaned up the cluster. Then I re-deployed the Piraeus operator with just the following change in the Helm values:

   storagePools:
     lvmThinPools:
     - devicePaths:
-      - /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1
+      - /dev/sdb
       name: lvm-thin
       thinVolume: thinpool
       volumeGroup: ""

And it now works fine, all statuses are green:

$ linstor storage-pool l
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool          ┊ Node  ┊ Driver   ┊ PoolName                  ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ node1 ┊ DISKLESS ┊                           ┊              ┊               ┊ False        ┊ Ok    ┊
┊ DfltDisklessStorPool ┊ node2 ┊ DISKLESS ┊                           ┊              ┊               ┊ False        ┊ Ok    ┊
┊ DfltDisklessStorPool ┊ node3 ┊ DISKLESS ┊                           ┊              ┊               ┊ False        ┊ Ok    ┊
┊ lvm-thin             ┊ node1 ┊ LVM_THIN ┊ linstor_thinpool/thinpool ┊     7.98 GiB ┊      7.98 GiB ┊ True         ┊ Ok    ┊
┊ lvm-thin             ┊ node2 ┊ LVM_THIN ┊ linstor_thinpool/thinpool ┊     7.98 GiB ┊      7.98 GiB ┊ True         ┊ Ok    ┊
┊ lvm-thin             ┊ node3 ┊ LVM_THIN ┊ linstor_thinpool/thinpool ┊     7.98 GiB ┊      7.98 GiB ┊ True         ┊ Ok    ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Dec 25 '20 12:12 immanuelfodor

A minor note that after the statuses had become green, I also needed to modify the storagePool option of the StorageClass from thinpool to lvm-thin to make PVC provisioning work but I think it's not related to the failed device preparation.

Dec 25 '20 12:12 immanuelfodor

Thank you for reporting this issue.

You are right in your suspicion that symlinks don't work.

If a storage pool does not exist on a node, it will try to:

Check the output of linstor physical-storage list and check if a matching device(s) is/are listed for the current node. If it is, it will use linstor physical-storage create-device-pool ... to prepare the device(s).
If no matching device(s) were found in the list, the operator assumes that the LVM pools are already present. In that case, the operator will call linstor storage-pool create ... with the assumption that LVM metadata already exists.

This implementation has the advantage that storage pools will be re-created in cases the node was "lost" for some reason. Such a node would already have the LVM metadata set up by LINSTOR, so it would no longer show in the linstor physical-storage list output.

In your case, Piraeus also does not find the given device in the list, and so directly jumps to step 2. This then fails in the above way, as the LVM metadata was never created.

I'll try to think of a way to improve this process. For now, I'll add the initial requirement of no symlinks to the list.

Jan 04 '21 08:01 WanzenBug

Wow, thanks for the explanation and the PR with the note!

Hopefully the sda/sdb layout will never change for me, as the second drive with the LVM is attached with another method to the KVM machines (sda: ide0, sdb: scsi0), and ide0 is marked as boot device.

Jan 04 '21 08:01 immanuelfodor

The PR have been merged, should we leave this issue open for you?

I'll try to think of a way to improve this process

Jan 04 '21 11:01 immanuelfodor

Yeah, you can leave it open.

Jan 04 '21 11:01 WanzenBug

piraeus-operator piraeus-operator copied to clipboard

Volume group not found error when using symlink devices

Details

piraeus-operator
piraeus-operator copied to clipboard