piraeus-operator
piraeus-operator copied to clipboard
Volume group not found error when using symlink devices
I think the storage preparation section in the docs (https://github.com/piraeusdatastore/piraeus-operator/blob/master/doc/storage.md#preparing-physical-devices) should warn against using symlink devices like the ones in /dev/disk/by-id/ or other /dev/disk/by-X folders.
The current 3 requirements should list a 4th one that the device must not be a symlink:
- Are a root device (no partition)
- do not contain partition information
- have more than 1 GiB
Although this change could solve provisioning problems for others, the real solution would be to support persistent device names.
Details
I tried to avoid using /dev/sdX in my devicePaths list for preparing devices as these names are known to be not safe for long-term usage, and one should use persistent device names: https://wiki.archlinux.org/index.php/persistent_block_device_naming
When the operator was deployed with a persistent device name (/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1), all seemed fine, however, when I added a StorageClass and tried to provision a PVC using it, the PVC kept to be unbound. I checked the CSI logs, and found these lines:
csi-provisioner I1225 09:05:54.115790 1 controller.go:645] CreateVolume failed, supports topology = false, node selected false => may reschedule = false => state = Finished: rpc error: code = Internal desc = CreateVolume fail
ed for pvc-0797f634-b26b-4d82-b5e0-d35014deb438: Message: 'Not enough available nodes'; Details: 'Not enough nodes fulfilling the following auto-place criteria:
csi-provisioner * has a deployed storage pool named TransactionList [thinpool]
csi-provisioner * the storage pools have to have at least '5242880' free space
csi-provisioner * the current access context has enough privileges to use the node and the storage pool
csi-provisioner * the node is online
csi-provisioner Auto-place configuration details:
csi-provisioner Additional place count: 3
csi-provisioner Don't place with resource (List): [pvc-0797f634-b26b-4d82-b5e0-d35014deb438]
csi-provisioner Storage pool name: TransactionList [thinpool]
csi-provisioner Layer stack: [DRBD, STORAGE]
csi-provisioner Auto-placing resource: pvc-0797f634-b26b-4d82-b5e0-d35014deb438'
csi-provisioner I1225 09:05:54.115819 1 controller.go:1084] Final error received, removing PVC 0797f634-b26b-4d82-b5e0-d35014deb438 from claims in progress
csi-provisioner W1225 09:05:54.115827 1 controller.go:943] Retrying syncing claim "0797f634-b26b-4d82-b5e0-d35014deb438", failure 7
Started to dig deeper, and found these errors via linstor CLI:
$ linstor storage-pool l
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool ┊ Node ┊ Driver ┊ PoolName ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ node1 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊
┊ DfltDisklessStorPool ┊ node2 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊
┊ DfltDisklessStorPool ┊ node3 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊
┊ lvm-thin ┊ node1 ┊ LVM_THIN ┊ linstor_thinpool/thinpool ┊ 0 KiB ┊ 0 KiB ┊ True ┊ Error ┊
┊ lvm-thin ┊ node2 ┊ LVM_THIN ┊ linstor_thinpool/thinpool ┊ 0 KiB ┊ 0 KiB ┊ True ┊ Error ┊
┊ lvm-thin ┊ node3 ┊ LVM_THIN ┊ linstor_thinpool/thinpool ┊ 0 KiB ┊ 0 KiB ┊ True ┊ Error ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
ERROR:
Description:
Node: 'node1', storage pool: 'lvm-thin' - Failed to query free space from storage pool
Cause:
Volume group 'linstor_thinpool' not found
ERROR:
Description:
Node: 'node2', storage pool: 'lvm-thin' - Failed to query free space from storage pool
Cause:
Volume group 'linstor_thinpool' not found
ERROR:
Description:
Node: 'node3', storage pool: 'lvm-think' - Failed to query free space from storage pool
Cause:
Volume group 'linstor_thinpool' not found
The related error log:
$ cat /var/log/linstor-satellite/ErrorReport-5FE58E5C-1F7FF-000000.log
ERROR REPORT 5FE58E5C-1F7FF-000000
============================================================
Application: LINBIT? LINSTOR
Module: Satellite
Version: 1.11.0
Build ID: 3367e32d0fa92515efe61f6963767700a8701d98
Build time: 2020-12-18T08:40:35+00:00
Error time: 2020-12-25 07:02:59
Node: node3
============================================================
Reported error:
===============
Description:
Volume group 'linstor_thinpool' not found
Category: LinStorException
Class name: StorageException
Class canonical name: com.linbit.linstor.storage.StorageException
Generated at: Method 'checkVgExists', Source file 'LvmUtils.java', Line #398
Error message: Volume group 'linstor_thinpool' not found
Call backtrace:
Method Native Class:Line number
checkVgExists N com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:398
checkVolumeGroupEntry N com.linbit.linstor.layer.storage.utils.StorageConfigReader:63
checkConfig N com.linbit.linstor.layer.storage.lvm.LvmProvider:549
checkStorPool N com.linbit.linstor.layer.storage.StorageLayer:396
getSpaceInfo N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:913
getSpaceInfo N com.linbit.linstor.core.devmgr.DeviceManagerImpl:1225
getStoragePoolSpaceInfo N com.linbit.linstor.core.apicallhandler.StltApiCallHandlerUtils:279
applyChanges N com.linbit.linstor.core.apicallhandler.StltStorPoolApiCallHandler:235
applyFullSync N com.linbit.linstor.core.apicallhandler.StltApiCallHandler:332
execute N com.linbit.linstor.api.protobuf.FullSync:94
executeNonReactive N com.linbit.linstor.proto.CommonMessageProcessor:525
lambda$execute$13 N com.linbit.linstor.proto.CommonMessageProcessor:500
doInScope N com.linbit.linstor.core.apicallhandler.ScopeRunner:147
lambda$fluxInScope$0 N com.linbit.linstor.core.apicallhandler.ScopeRunner:75
call N reactor.core.publisher.MonoCallable:91
trySubscribeScalarMap N reactor.core.publisher.FluxFlatMap:126
subscribeOrReturn N reactor.core.publisher.MonoFlatMapMany:49
subscribe N reactor.core.publisher.Flux:8343
onNext N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
request N reactor.core.publisher.Operators$ScalarSubscription:2344
onSubscribe N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:134
subscribe N reactor.core.publisher.MonoCurrentContext:35
subscribe N reactor.core.publisher.Flux:8357
onNext N reactor.core.publisher.FluxFlatMap$FlatMapMain:418
slowPath N reactor.core.publisher.FluxArray$ArraySubscription:126
request N reactor.core.publisher.FluxArray$ArraySubscription:99
onSubscribe N reactor.core.publisher.FluxFlatMap$FlatMapMain:363
subscribe N reactor.core.publisher.FluxMerge:69
subscribe N reactor.core.publisher.Flux:8357
onComplete N reactor.core.publisher.FluxConcatArray$ConcatArraySubscriber:207
subscribe N reactor.core.publisher.FluxConcatArray:80
subscribe N reactor.core.publisher.InternalFluxOperator:62
subscribe N reactor.core.publisher.FluxDefer:54
subscribe N reactor.core.publisher.Flux:8357
onNext N reactor.core.publisher.FluxFlatMap$FlatMapMain:418
drainAsync N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
drain N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
onNext N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
drainFused N reactor.core.publisher.UnicastProcessor:286
drain N reactor.core.publisher.UnicastProcessor:329
onNext N reactor.core.publisher.UnicastProcessor:408
next N reactor.core.publisher.FluxCreate$IgnoreSink:618
next N reactor.core.publisher.FluxCreate$SerializedSink:153
processInOrder N com.linbit.linstor.netcom.TcpConnectorPeer:373
doProcessMessage N com.linbit.linstor.proto.CommonMessageProcessor:218
lambda$processMessage$2 N com.linbit.linstor.proto.CommonMessageProcessor:164
onNext N reactor.core.publisher.FluxPeek$PeekSubscriber:177
runAsync N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
run N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
call N reactor.core.scheduler.WorkerTask:84
call N reactor.core.scheduler.WorkerTask:37
run N java.util.concurrent.FutureTask:264
run N java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
runWorker N java.util.concurrent.ThreadPoolExecutor:1128
run N java.util.concurrent.ThreadPoolExecutor$Worker:628
run N java.lang.Thread:834
END OF ERROR REPORT.
I also found this list, and started to wonder how these became sdb:
$ linstor physical-storage l
╭───────────────────────────────────────────╮
┊ Size ┊ Rotational ┊ Nodes ┊
╞═══════════════════════════════════════════╡
┊ 8589934592 ┊ True ┊ node1[/dev/sdb] ┊
┊ ┊ ┊ node2[/dev/sdb] ┊
┊ ┊ ┊ node3[/dev/sdb] ┊
╰───────────────────────────────────────────╯
It seems that it followed the symlink to sdb:
$ ls -la /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1
lrwxrwxrwx. 1 root root 9 Dec 25 10:58 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1 -> ../../sdb
Just to make sure that the device is compatible with the 3 required points, I checked fdisk:
$ fdisk /dev/sdb
Welcome to fdisk (util-linux 2.32.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0xd9639fdb.
Command (m for help): m
Help:
DOS (MBR)
a toggle a bootable flag
b edit nested BSD disklabel
c toggle the dos compatibility flag
Generic
d delete a partition
F list free unpartitioned space
l list known partition types
n add a new partition
p print the partition table
t change a partition type
v verify the partition table
i print information about a partition
Misc
m print this menu
u change display/entry units
x extra functionality (experts only)
Script
I load disk layout from sfdisk script file
O dump disk layout to sfdisk script file
Save & Exit
w write table to disk and exit
q quit without saving changes
Create a new label
g create a new empty GPT partition table
G create a new empty SGI (IRIX) partition table
o create a new empty DOS partition table
s create a new empty Sun partition table
Command (m for help): F
Unpartitioned space /dev/sdb: 8 GiB, 8588886016 bytes, 16775168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
Start End Sectors Size
2048 16777215 16775168 8G
Command (m for help): p
Disk /dev/sdb: 8 GiB, 8589934592 bytes, 16777216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xd9639fdb
Command (m for help): v
Remaining 16777215 unallocated 512-byte sectors.
Command (m for help): i
No partition is defined yet!
Command (m for help): q
So I started to suspect that something is problematic with the persistent name being a symlink. Since I have no experience with Piraeus/Linstor device migration if it's possible or not, I removed everything from the cluster that is related to these, removed all etcd host path volumes, and cleaned up the cluster. Then I re-deployed the Piraeus operator with just the following change in the Helm values:
storagePools:
lvmThinPools:
- devicePaths:
- - /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1
+ - /dev/sdb
name: lvm-thin
thinVolume: thinpool
volumeGroup: ""
And it now works fine, all statuses are green:
$ linstor storage-pool l
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool ┊ Node ┊ Driver ┊ PoolName ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ node1 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊
┊ DfltDisklessStorPool ┊ node2 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊
┊ DfltDisklessStorPool ┊ node3 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊
┊ lvm-thin ┊ node1 ┊ LVM_THIN ┊ linstor_thinpool/thinpool ┊ 7.98 GiB ┊ 7.98 GiB ┊ True ┊ Ok ┊
┊ lvm-thin ┊ node2 ┊ LVM_THIN ┊ linstor_thinpool/thinpool ┊ 7.98 GiB ┊ 7.98 GiB ┊ True ┊ Ok ┊
┊ lvm-thin ┊ node3 ┊ LVM_THIN ┊ linstor_thinpool/thinpool ┊ 7.98 GiB ┊ 7.98 GiB ┊ True ┊ Ok ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
A minor note that after the statuses had become green, I also needed to modify the storagePool option of the StorageClass from thinpool to lvm-thin to make PVC provisioning work but I think it's not related to the failed device preparation.
Thank you for reporting this issue.
You are right in your suspicion that symlinks don't work.
If a storage pool does not exist on a node, it will try to:
- Check the output of
linstor physical-storage listand check if a matching device(s) is/are listed for the current node. If it is, it will uselinstor physical-storage create-device-pool ...to prepare the device(s). - If no matching device(s) were found in the list, the operator assumes that the LVM pools are already present. In that case, the operator will call
linstor storage-pool create ...with the assumption that LVM metadata already exists.
This implementation has the advantage that storage pools will be re-created in cases the node was "lost" for some reason. Such a node would already have the LVM metadata set up by LINSTOR, so it would no longer show in the linstor physical-storage list output.
In your case, Piraeus also does not find the given device in the list, and so directly jumps to step 2. This then fails in the above way, as the LVM metadata was never created.
I'll try to think of a way to improve this process. For now, I'll add the initial requirement of no symlinks to the list.
Wow, thanks for the explanation and the PR with the note!
Hopefully the sda/sdb layout will never change for me, as the second drive with the LVM is attached with another method to the KVM machines (sda: ide0, sdb: scsi0), and ide0 is marked as boot device.
The PR have been merged, should we leave this issue open for you?
I'll try to think of a way to improve this process
Yeah, you can leave it open.