Feature request: Handle fail cases caused by missing LVM devices.
Hi, I just faced with issue of resizing the volume:
# linstor r l -r pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ Conns ┊ State ┊ CreatedOn ┊
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d ┊ monster-killer ┊ 7004 ┊ InUse ┊ Ok ┊ Resizing, UpToDate ┊ 2022-09-13 09:47:31 ┊
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
I tired to invoke resize operation manually:
# linstor vd set-size pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d 0 19531250KiB
SUCCESS:
Description:
Volume definition with number '0' of resource definition 'pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d' modified.
Details:
Volume definition with number '0' of resource definition 'pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d' UUID is: 7f380b22-6ece-41cf-9f2b-5032b29c6868
ERROR:
(Node: 'monster-killer') Failed to access DRBD super-block of volume pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d/0
Show reports:
linstor error-reports show 635BD872-5C0FA-000126
error report:
# linstor error-reports show 635BD872-5C0FA-000126
ERROR REPORT 635BD872-5C0FA-000126
============================================================
Application: LINBIT�� LINSTOR
Module: Satellite
Version: 1.19.1
Build ID: a758bf07796c374fd2004465b0d8690209b74356
Build time: 2022-07-28T04:54:55+00:00
Error time: 2022-11-03 09:52:23
Node: monster-killer
============================================================
Reported error:
===============
Description:
Failed to access DRBD super-block of volume pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d/0
Category: LinStorException
Class name: VolumeException
Class canonical name: com.linbit.linstor.core.devmgr.exceptions.VolumeException
Generated at: Method 'hasMetaData', Source file 'DrbdLayer.java', Line #1067
Error message: Failed to access DRBD super-block of volume pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d/0
Error context:
An error occurred while processing resource 'Node: 'monster-killer', Rsc: 'pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d''
Call backtrace:
Method Native Class:Line number
hasMetaData N com.linbit.linstor.layer.drbd.DrbdLayer:1067
adjustDrbd N com.linbit.linstor.layer.drbd.DrbdLayer:627
process N com.linbit.linstor.layer.drbd.DrbdLayer:393
process N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:847
processResourcesAndSnapshots N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:359
dispatchResources N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:169
dispatchResources N com.linbit.linstor.core.devmgr.DeviceManagerImpl:309
phaseDispatchDeviceHandlers N com.linbit.linstor.core.devmgr.DeviceManagerImpl:1083
devMgrLoop N com.linbit.linstor.core.devmgr.DeviceManagerImpl:735
run N com.linbit.linstor.core.devmgr.DeviceManagerImpl:631
run N java.lang.Thread:829
Caused by:
==========
Category: Exception
Class name: NoSuchFileException
Class canonical name: java.nio.file.NoSuchFileException
Generated at: Method 'translateToIOException', Source file 'UnixException.java', Line #92
Error message: /dev/linstor/pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d_00000
Call backtrace:
Method Native Class:Line number
translateToIOException N sun.nio.fs.UnixException:92
rethrowAsIOException N sun.nio.fs.UnixException:111
rethrowAsIOException N sun.nio.fs.UnixException:116
newFileChannel N sun.nio.fs.UnixFileSystemProvider:182
open N java.nio.channels.FileChannel:292
open N java.nio.channels.FileChannel:345
readObject N com.linbit.linstor.layer.drbd.utils.MdSuperblockBuffer:74
hasMetaData N com.linbit.linstor.layer.drbd.DrbdLayer:1062
adjustDrbd N com.linbit.linstor.layer.drbd.DrbdLayer:627
process N com.linbit.linstor.layer.drbd.DrbdLayer:393
process N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:847
processResourcesAndSnapshots N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:359
dispatchResources N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:169
dispatchResources N com.linbit.linstor.core.devmgr.DeviceManagerImpl:309
phaseDispatchDeviceHandlers N com.linbit.linstor.core.devmgr.DeviceManagerImpl:1083
devMgrLoop N com.linbit.linstor.core.devmgr.DeviceManagerImpl:735
run N com.linbit.linstor.core.devmgr.DeviceManagerImpl:631
run N java.lang.Thread:829
END OF ERROR REPORT.
Seems wasn't able to find /dev/linstor/pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d_00000 device, okay, let's exec into pod:
LVM found (already resized):
# lvs | grep pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d
pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d_00000 linstor -wi-ao---- 18.63g
DRBD found (not resized):
# lsblk /dev/drbd1004
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
drbd1004 147:1004 0 10G 0 disk /var/lib/kubelet/pods/56332201-3640-4de8-9ebb-52244111c406/volumes/kubernetes.io~csi/pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d/mount
Drbdadm adjust does not make anything:
# lsblk /dev/drbd1004
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
drbd1004 147:1004 0 10G 0 disk /var/lib/kubelet/pods/56332201-3640-4de8-9ebb-52244111c406/volumes/kubernetes.io~csi/pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d/mount
# drbdadm adjust pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d
# lsblk /dev/drbd1004
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
drbd1004 147:1004 0 10G 0 disk /var/lib/kubelet/pods/56332201-3640-4de8-9ebb-52244111c406/volumes/kubernetes.io~csi/pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d/mount
Drbdadm down/up wasn't completed because of missing device:
# drbdadm down pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d
# drbdadm up pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d
open(/dev/linstor/pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d_00000) failed: No such file or directory
Command 'drbdmeta 1004 v09 /dev/linstor/pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d_00000 internal apply-al' terminated with exit code 20
command terminated with exit code 1
# drbdadm up pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d
Defaulted container "linstor-satellite" out of: linstor-satellite, kube-rbac-proxy, drbd-prometheus-exporter, kernel-module-injector (init)
new-minor pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d 1004 0: sysfs node '/sys/devices/virtual/block/drbd1004' (already? still?) exists
pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d: Failure: (161) Minor or volume exists already (delete it first)
Command 'drbdsetup new-minor pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d 1004 0' terminated with exit code 10
command terminated with exit code 1
# drbdadm status pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d
pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d role:Secondary
disk:Diskless
lvchange make this device appears back on the node:
# lvchange -ay linstor/pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d_00000
# ls /dev/linstor/pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d*
ls: cannot access '/dev/linstor/pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d*': No such file or directory
# lvs | grep pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d
pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d_00000 linstor -wi-a----- 18.63g
# lvchange -an linstor/pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d_00000
# ls /dev/linstor/ | grep pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d
# lvchange -ay linstor/pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d_00000
# ls /dev/linstor/
pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d_00000
# drbdadm adjust pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d
Moving the internal meta data to its proper location
Internal drbd meta data successfully moved.
# drbdadm status pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d
pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d role:Secondary
disk:UpToDate
That is not a first case when I see that LVM devices are disappearing from the node this way.
Since we can't make influence on LVM to make it working more predictable. I suggest a few enhancements in linstor-server to improve diagnostics and troubleshooting process:
- Detect missing backing device path and report problem about this (or don't allow running resize and related operations)
- Consider adding some automation for fixing such issues, eg. In case if device is not
InUse, rundrbdadm down; lvchange -an; lvchange -ay; drbdadm up. Or is there any better method?
Today this issue was repeated on different cluster, the resource was stuck on resizing, because of missing LV:
# linstor r l
Defaulted container "linstor-controller" out of: linstor-controller, kube-rbac-proxy
+-------------------------------------------------------------------------------------------------------------------------------------+
| ResourceName | Node | Port | Usage | Conns | State | CreatedOn |
|=====================================================================================================================================|
| pvc-96665a02-7aaa-4f19-b10a-74ec53fac434 | slt-dev-kube-system-01 | 7000 | InUse | Ok | Resizing, UpToDate | 2022-10-06 09:32:06 |
+-------------------------------------------------------------------------------------------------------------------------------------+
# linstor vd l
Defaulted container "linstor-controller" out of: linstor-controller, kube-rbac-proxy
+------------------------------------------------------------------------------------------------+
| ResourceName | VolumeNr | VolumeMinor | Size | Gross | State |
|================================================================================================|
| pvc-96665a02-7aaa-4f19-b10a-74ec53fac434 | 0 | 1000 | 100 GiB | | resizing |
+------------------------------------------------------------------------------------------------+
# linstor vd set-size pvc-96665a02-7aaa-4f19-b10a-74ec53fac434 0 100G
Defaulted container "linstor-controller" out of: linstor-controller, kube-rbac-proxy
SUCCESS:
Description:
Volume definition with number '0' of resource definition 'pvc-96665a02-7aaa-4f19-b10a-74ec53fac434' modified.
Details:
Volume definition with number '0' of resource definition 'pvc-96665a02-7aaa-4f19-b10a-74ec53fac434' UUID is: a58b59cd-ce4a-46c2-b9cd-1d7a7eca1b4e
ERROR:
(Node: 'slt-dev-kube-system-01') Failed to access DRBD super-block of volume pvc-96665a02-7aaa-4f19-b10a-74ec53fac434/0
Show reports:
linstor error-reports show 639FE3FF-E8C1E-000009
command terminated with exit code 10
# linstor vd l
Defaulted container "linstor-controller" out of: linstor-controller, kube-rbac-proxy
+------------------------------------------------------------------------------------------------+
| ResourceName | VolumeNr | VolumeMinor | Size | Gross | State |
|================================================================================================|
| pvc-96665a02-7aaa-4f19-b10a-74ec53fac434 | 0 | 1000 | 100 GiB | | resizing |
+------------------------------------------------------------------------------------------------+
# linstor error-reports show 639FE3FF-E8C1E-000009
ERROR REPORT 639FE3FF-E8C1E-000009
============================================================
Application: LINBIT�� LINSTOR
Module: Satellite
Version: 1.20.0
Build ID: 9c6f7fad48521899f7a99c564b1d33aeacfdbfa8
Build time: 2022-11-07T16:37:38+00:00
Error time: 2022-12-28 11:16:25
Node: slt-dev-kube-system-01
============================================================
Reported error:
===============
Description:
Failed to access DRBD super-block of volume pvc-96665a02-7aaa-4f19-b10a-74ec53fac434/0
Category: LinStorException
Class name: VolumeException
Class canonical name: com.linbit.linstor.core.devmgr.exceptions.VolumeException
Generated at: Method 'hasMetaData', Source file 'DrbdLayer.java', Line #1087
Error message: Failed to access DRBD super-block of volume pvc-96665a02-7aaa-4f19-b10a-74ec53fac434/0
Error context:
An error occurred while processing resource 'Node: 'slt-dev-kube-system-01', Rsc: 'pvc-96665a02-7aaa-4f19-b10a-74ec53fac434''
Call backtrace:
Method Native Class:Line number
hasMetaData N com.linbit.linstor.layer.drbd.DrbdLayer:1087
adjustDrbd N com.linbit.linstor.layer.drbd.DrbdLayer:622
process N com.linbit.linstor.layer.drbd.DrbdLayer:396
process N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:900
processResourcesAndSnapshots N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:358
dispatchResources N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:168
dispatchResources N com.linbit.linstor.core.devmgr.DeviceManagerImpl:309
phaseDispatchDeviceHandlers N com.linbit.linstor.core.devmgr.DeviceManagerImpl:1083
devMgrLoop N com.linbit.linstor.core.devmgr.DeviceManagerImpl:735
run N com.linbit.linstor.core.devmgr.DeviceManagerImpl:631
run N java.lang.Thread:829
Caused by:
==========
Category: Exception
Class name: NoSuchFileException
Class canonical name: java.nio.file.NoSuchFileException
Generated at: Method 'translateToIOException', Source file 'UnixException.java', Line #92
Error message: /dev/data/pvc-96665a02-7aaa-4f19-b10a-74ec53fac434_00000
Call backtrace:
Method Native Class:Line number
translateToIOException N sun.nio.fs.UnixException:92
rethrowAsIOException N sun.nio.fs.UnixException:111
rethrowAsIOException N sun.nio.fs.UnixException:116
newFileChannel N sun.nio.fs.UnixFileSystemProvider:182
open N java.nio.channels.FileChannel:292
open N java.nio.channels.FileChannel:345
readObject N com.linbit.linstor.layer.drbd.utils.MdSuperblockBuffer:74
hasMetaData N com.linbit.linstor.layer.drbd.DrbdLayer:1082
adjustDrbd N com.linbit.linstor.layer.drbd.DrbdLayer:622
process N com.linbit.linstor.layer.drbd.DrbdLayer:396
process N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:900
processResourcesAndSnapshots N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:358
dispatchResources N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:168
dispatchResources N com.linbit.linstor.core.devmgr.DeviceManagerImpl:309
phaseDispatchDeviceHandlers N com.linbit.linstor.core.devmgr.DeviceManagerImpl:1083
devMgrLoop N com.linbit.linstor.core.devmgr.DeviceManagerImpl:735
run N com.linbit.linstor.core.devmgr.DeviceManagerImpl:631
run N java.lang.Thread:829
END OF ERROR REPORT.
I know that this is not linstor issue, but since we relying on existing technologies we need to know how to live and how to overcome their bugs.
The issue above was fixed by recreating symlink manually:
# lvscan | grep pvc
ACTIVE '/dev/data/pvc-96665a02-7aaa-4f19-b10a-74ec53fac434_00000' [100.02 GiB] inherit
# ls -lah /dev/data/pvc-96665a02-7aaa-4f19-b10a-74ec53fac434_00000
ls: cannot access '/dev/data/pvc-96665a02-7aaa-4f19-b10a-74ec53fac434_00000': No such file or directory
# dmsetup ls | grep pvc
data-pvc--96665a02--7aaa--4f19--b10a--74ec53fac434_00000 (253:0)
# ls -lah /dev/dm-* | grep "253, 0"
brw-rw---- 1 root disk 253, 0 Dec 28 10:06 /dev/dm-0
# ln -s /dev/dm-0 /dev/data/pvc-96665a02-7aaa-4f19-b10a-74ec53fac434_00000
# linstor vd set-size pvc-96665a02-7aaa-4f19-b10a-74ec53fac434 0 100G
SUCCESS:
Description:
Volume definition with number '0' of resource definition 'pvc-96665a02-7aaa-4f19-b10a-74ec53fac434' modified.
Details:
Volume definition with number '0' of resource definition 'pvc-96665a02-7aaa-4f19-b10a-74ec53fac434' UUID is: a58b59cd-ce4a-46c2-b9cd-1d7a7eca1b4e
Thus symlinks can be recovered without invoking lvchange -an; lvchange -ay commands.
@ghernadi the devices are active anyway, can't we automate this to not rely on udev daemon?
Today I faced again with problem of missing symlink. I went through the many bugs trying to fix that attempt to resize, eg:
root@slt-dev-kube-system-02:/# linstor r l
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ Conns ┊ State ┊ CreatedOn ┊
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-96665a02-7aaa-4f19-b10a-74ec53fac434 ┊ slt-dev-kube-system-01 ┊ 7000 ┊ InUse ┊ Ok ┊ Resizing, UpToDate ┊ 2022-10-06 09:32:06 ┊
┊ pvc-96665a02-7aaa-4f19-b10a-74ec53fac434 ┊ slt-dev-kube-system-02 ┊ 7000 ┊ ┊ Ok ┊ Resizing, Unknown ┊ 2023-01-31 15:38:42 ┊
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
root@slt-dev-kube-system-02:/# linstor r d slt-dev-kube-system-02 pvc-96665a02-7aaa-4f19-b10a-74ec53fac434
SUCCESS:
Description:
Node: slt-dev-kube-system-02, Resource: pvc-96665a02-7aaa-4f19-b10a-74ec53fac434 preparing for deletion.
Details:
Node: slt-dev-kube-system-02, Resource: pvc-96665a02-7aaa-4f19-b10a-74ec53fac434 UUID is: 8691638c-2caf-4779-a462-a6b54f13cd71
SUCCESS:
Preparing deletion of resource on 'slt-dev-kube-system-02'
ERROR:
(Node: 'slt-dev-kube-system-01') Failed to access DRBD super-block of volume pvc-96665a02-7aaa-4f19-b10a-74ec53fac434/0
Show reports:
linstor error-reports show 63D51331-E8C1E-000017
ERROR:
Description:
Deletion of resource 'pvc-96665a02-7aaa-4f19-b10a-74ec53fac434' on node 'slt-dev-kube-system-02' failed due to an unknown exception.
Details:
Node: slt-dev-kube-system-02, Resource: pvc-96665a02-7aaa-4f19-b10a-74ec53fac434
Show reports:
linstor error-reports show 63CACC00-00000-000007
linstor error-reports show 63CACC00-00000-000007
ERROR REPORT 63CACC00-00000-000007
============================================================
Application: LINBIT�� LINSTOR
Module: Controller
Version: 1.20.0
Build ID: 9c6f7fad48521899f7a99c564b1d33aeacfdbfa8
Build time: 2022-11-07T16:37:38+00:00
Error time: 2023-02-01 13:58:47
Node: linstor-controller-766b7f6574-h469w
Peer: RestClient(192.168.236.102; 'PythonLinstor/1.15.1 (API1.0.4): Client 1.15.1')
============================================================
Reported error:
===============
Category: RuntimeException
Class name: DelayedApiRcException
Class canonical name: com.linbit.linstor.core.apicallhandler.response.CtrlResponseUtils.DelayedApiRcException
Generated at: Method 'lambda$mergeExtractingApiRcExceptions$4', Source file 'CtrlResponseUtils.java', Line #126
Error message: Exceptions have been converted to responses
Error context:
Deletion of resource 'pvc-96665a02-7aaa-4f19-b10a-74ec53fac434' on node 'slt-dev-kube-system-02' failed due to an unknown exception.
Asynchronous stage backtrace:
(Node: 'slt-dev-kube-system-01') Failed to access DRBD super-block of volume pvc-96665a02-7aaa-4f19-b10a-74ec53fac434/0
Error has been observed at the following site(s):
|_ checkpoint ? Prepare resource delete
|_ checkpoint ? Activating resource if necessary before deletion
Stack trace:
Call backtrace:
Method Native Class:Line number
lambda$mergeExtractingApiRcExceptions$4 N com.linbit.linstor.core.apicallhandler.response.CtrlResponseUtils:126
Suppressed exception 1 of 2:
===============
Category: RuntimeException
Class name: ApiRcException
Class canonical name: com.linbit.linstor.core.apicallhandler.response.ApiRcException
Generated at: Method 'handleAnswer', Source file 'CommonMessageProcessor.java', Line #337
Error message: (Node: 'slt-dev-kube-system-01') Failed to access DRBD super-block of volume pvc-96665a02-7aaa-4f19-b10a-74ec53fac434/0
Error context:
Deletion of resource 'pvc-96665a02-7aaa-4f19-b10a-74ec53fac434' on node 'slt-dev-kube-system-02' failed due to an unknown exception.
ApiRcException entries:
Nr: 1
Message: (Node: 'slt-dev-kube-system-01') Failed to access DRBD super-block of volume pvc-96665a02-7aaa-4f19-b10a-74ec53fac434/0
Call backtrace:
Method Native Class:Line number
handleAnswer N com.linbit.linstor.proto.CommonMessageProcessor:337
handleDataMessage N com.linbit.linstor.proto.CommonMessageProcessor:284
doProcessInOrderMessage N com.linbit.linstor.proto.CommonMessageProcessor:235
lambda$doProcessMessage$3 N com.linbit.linstor.proto.CommonMessageProcessor:220
subscribe N reactor.core.publisher.FluxDefer:46
subscribe N reactor.core.publisher.Flux:8357
onNext N reactor.core.publisher.FluxFlatMap$FlatMapMain:418
drainAsync N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
drain N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
onNext N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
drainFused N reactor.core.publisher.UnicastProcessor:286
drain N reactor.core.publisher.UnicastProcessor:329
onNext N reactor.core.publisher.UnicastProcessor:408
next N reactor.core.publisher.FluxCreate$IgnoreSink:618
next N reactor.core.publisher.FluxCreate$SerializedSink:153
processInOrder N com.linbit.linstor.netcom.TcpConnectorPeer:383
doProcessMessage N com.linbit.linstor.proto.CommonMessageProcessor:218
lambda$processMessage$2 N com.linbit.linstor.proto.CommonMessageProcessor:164
onNext N reactor.core.publisher.FluxPeek$PeekSubscriber:177
runAsync N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
run N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
call N reactor.core.scheduler.WorkerTask:84
call N reactor.core.scheduler.WorkerTask:37
run N java.util.concurrent.FutureTask:264
run N java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
runWorker N java.util.concurrent.ThreadPoolExecutor:1128
run N java.util.concurrent.ThreadPoolExecutor$Worker:628
run N java.lang.Thread:829
Suppressed exception 2 of 2:
===============
Category: RuntimeException
Class name: OnAssemblyException
Class canonical name: reactor.core.publisher.FluxOnAssembly.OnAssemblyException
Generated at: Method 'lambda$mergeExtractingApiRcExceptions$4', Source file 'CtrlResponseUtils.java', Line #126
Error message:
Error has been observed at the following site(s):
|_ checkpoint ��� Prepare resource delete
|_ checkpoint ��� Activating resource if necessary before deletion
Stack trace:
Error context:
Deletion of resource 'pvc-96665a02-7aaa-4f19-b10a-74ec53fac434' on node 'slt-dev-kube-system-02' failed due to an unknown exception.
Call backtrace:
Method Native Class:Line number
lambda$mergeExtractingApiRcExceptions$4 N com.linbit.linstor.core.apicallhandler.response.CtrlResponseUtils:126
subscribe N reactor.core.publisher.FluxDefer:46
subscribe N reactor.core.publisher.Flux:8357
onComplete N reactor.core.publisher.FluxConcatArray$ConcatArraySubscriber:207
onComplete N reactor.core.publisher.FluxMap$MapSubscriber:136
checkTerminated N reactor.core.publisher.FluxFlatMap$FlatMapMain:838
drainLoop N reactor.core.publisher.FluxFlatMap$FlatMapMain:600
innerComplete N reactor.core.publisher.FluxFlatMap$FlatMapMain:909
onComplete N reactor.core.publisher.FluxFlatMap$FlatMapInner:1013
onComplete N reactor.core.publisher.Operators$MultiSubscriptionSubscriber:2016
request N reactor.core.publisher.FluxJust$WeakScalarSubscription:101
set N reactor.core.publisher.Operators$MultiSubscriptionSubscriber:2152
onSubscribe N reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber:68
subscribe N reactor.core.publisher.FluxJust:70
subscribe N reactor.core.publisher.Flux:8357
onError N reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber:97
onError N reactor.core.publisher.FluxMap$MapSubscriber:126
onError N reactor.core.publisher.Operators$MultiSubscriptionSubscriber:2021
onError N reactor.core.publisher.MonoIgnoreElements$IgnoreElementsSubscriber:76
onError N reactor.core.publisher.FluxPeek$PeekSubscriber:214
onError N reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber:100
error N reactor.core.publisher.Operators:196
subscribe N reactor.core.publisher.FluxError:43
subscribe N reactor.core.publisher.Flux:8357
onError N reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber:97
onError N reactor.core.publisher.FluxMap$MapSubscriber:126
onError N reactor.core.publisher.Operators$MultiSubscriptionSubscriber:2021
error N reactor.core.publisher.FluxCreate$BaseSink:452
drain N reactor.core.publisher.FluxCreate$BufferAsyncSink:781
error N reactor.core.publisher.FluxCreate$BufferAsyncSink:726
drainLoop N reactor.core.publisher.FluxCreate$SerializedSink:229
drain N reactor.core.publisher.FluxCreate$SerializedSink:205
error N reactor.core.publisher.FluxCreate$SerializedSink:181
apiCallError N com.linbit.linstor.netcom.TcpConnectorPeer:451
handleAnswer N com.linbit.linstor.proto.CommonMessageProcessor:349
handleDataMessage N com.linbit.linstor.proto.CommonMessageProcessor:284
doProcessInOrderMessage N com.linbit.linstor.proto.CommonMessageProcessor:235
lambda$doProcessMessage$3 N com.linbit.linstor.proto.CommonMessageProcessor:220
subscribe N reactor.core.publisher.FluxDefer:46
subscribe N reactor.core.publisher.Flux:8357
onNext N reactor.core.publisher.FluxFlatMap$FlatMapMain:418
drainAsync N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
drain N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
onNext N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
drainFused N reactor.core.publisher.UnicastProcessor:286
drain N reactor.core.publisher.UnicastProcessor:329
onNext N reactor.core.publisher.UnicastProcessor:408
next N reactor.core.publisher.FluxCreate$IgnoreSink:618
next N reactor.core.publisher.FluxCreate$SerializedSink:153
processInOrder N com.linbit.linstor.netcom.TcpConnectorPeer:383
doProcessMessage N com.linbit.linstor.proto.CommonMessageProcessor:218
lambda$processMessage$2 N com.linbit.linstor.proto.CommonMessageProcessor:164
onNext N reactor.core.publisher.FluxPeek$PeekSubscriber:177
runAsync N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
run N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
call N reactor.core.scheduler.WorkerTask:84
call N reactor.core.scheduler.WorkerTask:37
run N java.util.concurrent.FutureTask:264
run N java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
runWorker N java.util.concurrent.ThreadPoolExecutor:1128
run N java.util.concurrent.ThreadPoolExecutor$Worker:628
run N java.lang.Thread:829
END OF ERROR REPORT.
linstor error-reports show 63D51331-E8C1E-000017
ERROR REPORT 63D51331-E8C1E-000017
============================================================
Application: LINBIT�� LINSTOR
Module: Satellite
Version: 1.20.0
Build ID: 9c6f7fad48521899f7a99c564b1d33aeacfdbfa8
Build time: 2022-11-07T16:37:38+00:00
Error time: 2023-02-01 13:58:46
Node: slt-dev-kube-system-01
============================================================
Reported error:
===============
Description:
Failed to access DRBD super-block of volume pvc-96665a02-7aaa-4f19-b10a-74ec53fac434/0
Category: LinStorException
Class name: VolumeException
Class canonical name: com.linbit.linstor.core.devmgr.exceptions.VolumeException
Generated at: Method 'hasMetaData', Source file 'DrbdLayer.java', Line #1087
Error message: Failed to access DRBD super-block of volume pvc-96665a02-7aaa-4f19-b10a-74ec53fac434/0
Error context:
An error occurred while processing resource 'Node: 'slt-dev-kube-system-01', Rsc: 'pvc-96665a02-7aaa-4f19-b10a-74ec53fac434''
Call backtrace:
Method Native Class:Line number
hasMetaData N com.linbit.linstor.layer.drbd.DrbdLayer:1087
adjustDrbd N com.linbit.linstor.layer.drbd.DrbdLayer:622
process N com.linbit.linstor.layer.drbd.DrbdLayer:396
process N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:900
processResourcesAndSnapshots N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:358
dispatchResources N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:168
dispatchResources N com.linbit.linstor.core.devmgr.DeviceManagerImpl:309
phaseDispatchDeviceHandlers N com.linbit.linstor.core.devmgr.DeviceManagerImpl:1083
devMgrLoop N com.linbit.linstor.core.devmgr.DeviceManagerImpl:735
run N com.linbit.linstor.core.devmgr.DeviceManagerImpl:631
run N java.lang.Thread:829
Caused by:
==========
Category: Exception
Class name: NoSuchFileException
Class canonical name: java.nio.file.NoSuchFileException
Generated at: Method 'translateToIOException', Source file 'UnixException.java', Line #92
Error message: /dev/data/pvc-96665a02-7aaa-4f19-b10a-74ec53fac434_00000
Call backtrace:
Method Native Class:Line number
translateToIOException N sun.nio.fs.UnixException:92
rethrowAsIOException N sun.nio.fs.UnixException:111
rethrowAsIOException N sun.nio.fs.UnixException:116
newFileChannel N sun.nio.fs.UnixFileSystemProvider:182
open N java.nio.channels.FileChannel:292
open N java.nio.channels.FileChannel:345
readObject N com.linbit.linstor.layer.drbd.utils.MdSuperblockBuffer:74
hasMetaData N com.linbit.linstor.layer.drbd.DrbdLayer:1082
adjustDrbd N com.linbit.linstor.layer.drbd.DrbdLayer:622
process N com.linbit.linstor.layer.drbd.DrbdLayer:396
process N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:900
processResourcesAndSnapshots N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:358
dispatchResources N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:168
dispatchResources N com.linbit.linstor.core.devmgr.DeviceManagerImpl:309
phaseDispatchDeviceHandlers N com.linbit.linstor.core.devmgr.DeviceManagerImpl:1083
devMgrLoop N com.linbit.linstor.core.devmgr.DeviceManagerImpl:735
run N com.linbit.linstor.core.devmgr.DeviceManagerImpl:631
run N java.lang.Thread:829
END OF ERROR REPORT.
I found that vgscan --mknodes fixes the issue of missing symlink.
Thus can't we simple run it before the resizing attempt in case of missing device?
root@kube-master:~# kubectl -n dev get pvc data-dispace-redis-0 -o jsonpath='{.spec.resources.requests.storage}' && echo
512Mi
root@kube-node-1:~# ls /dev/linstor_data/pvc-a1d18874-32dd-4aa1-b965-e1c6494b734d*
/dev/linstor_data/pvc-a1d18874-32dd-4aa1-b965-e1c6494b734d_00000
root@kube-master:~# kubectl -n dev patch pvc data-dispace-redis-0 --type='json' -p='[{"op": "replace", "path": "/spec/resources/requests/storage", "value":"530Mi"}]'
persistentvolumeclaim/data-dispace-redis-0 patched
root@kube-master:~# kubectl -n dev get pvc data-dispace-redis-0 -o jsonpath='{.spec.resources.requests.storage}' && echo
530Mi
root@kube-node-1:~# ls /dev/linstor_data/pvc-a1d18874-32dd-4aa1-b965-e1c6494b734d*
ls: cannot access '/dev/linstor_data/pvc-a1d18874-32dd-4aa1-b965-e1c6494b734d*': No such file or directory
root@kube-master:~# linstor v l
+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node | Resource | StoragePool | VolNr | MinorNr | DeviceName | Allocated | InUse | State |
|===========================================================================================================================================================|
| kube-node-1 | pvc-a1d18874-32dd-4aa1-b965-e1c6494b734d | lvm | 0 | 1004 | /dev/drbd1004 | 532 MiB | Unused | Resizing, UpToDate |
| kube-node-2 | pvc-a1d18874-32dd-4aa1-b965-e1c6494b734d | lvm | 0 | 1004 | /dev/drbd1004 | 532 MiB | InUse | Resizing, UpToDate |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
Seems related to https://github.com/piraeusdatastore/piraeus/commit/9a9e38304a383fb0f13ca58f42f939eb634eac5f and https://bugs.debian.org/932433