Could not create multiple DRBD replicas on top of shared LUN
Hi, I have created shared LVM storage-pool by following the documentation on two nodes:
# linstor sp l -s shared-lun
+-----------------------------------------------------------------------------------------------------------------------------------------------+
| StoragePool | Node | Driver | PoolName | FreeCapacity | TotalCapacity | CanSnapshots | State | SharedName |
|===============================================================================================================================================|
| shared-lun | hf-virt-02 | LVM | shared-lun | 6.99 GiB | 10.00 GiB | False | Ok | Q8lSH2-axOB-mF5p-xGaL-zNOm-pkY8-GSCqY3 |
| shared-lun | hf-virt-03 | LVM | shared-lun | 6.99 GiB | 10.00 GiB | False | Ok | Q8lSH2-axOB-mF5p-xGaL-zNOm-pkY8-GSCqY3 |
+-----------------------------------------------------------------------------------------------------------------------------------------------+
But I can't create more than one drbd diskful device on it:
# linstor rd c abcd
# linstor vd c abcd 1G
# linstor r c hf-virt-03 abcd -s shared-lun
# linstor r l -r abcd
+------------------------------------------------------------------------------------+
| ResourceName | Node | Port | Usage | Conns | State | CreatedOn |
|====================================================================================|
| abcd | hf-virt-03 | 7030 | Unused | Ok | UpToDate | 2023-02-27 13:28:46 |
+------------------------------------------------------------------------------------+
# linstor r c hf-virt-02 abcd -s shared-lun
SUCCESS:
Successfully set property key(s): StorPoolName
SUCCESS:
Successfully set property key(s): StorPoolName
INFO:
Tie breaker resource 'abcd' created on DfltDisklessStorPool
INFO:
Resource-definition property 'DrbdOptions/Resource/quorum' updated from 'off' to 'majority' by auto-quorum
INFO:
Resource-definition property 'DrbdOptions/Resource/on-no-quorum' updated from 'off' to 'io-error' by auto-quorum
SUCCESS:
Description:
New resource 'abcd' on node 'hf-virt-02' registered.
Details:
Resource 'abcd' on node 'hf-virt-02' UUID is: 262e4235-59e8-4c8f-b81d-6e73db738daf
SUCCESS:
Description:
Volume with number '0' on resource 'abcd' on node 'hf-virt-02' successfully registered
Details:
Volume UUID is: c7522e61-1f58-433d-ad08-94bb8f17ef48
SUCCESS:
Added peer(s) 'hf-virt-02' to resource 'abcd' on 'hf-virt-01'
SUCCESS:
Added peer(s) 'hf-virt-02' to resource 'abcd' on 'hf-virt-03'
ERROR:
(Node: 'hf-virt-02') Failed to adjust DRBD resource abcd
Show reports:
linstor error-reports show 63FCA19A-57D8E-000001
# linstor r l -r abcd
+------------------------------------------------------------------------------------------------------------------+
| ResourceName | Node | Port | Usage | Conns | State | CreatedOn |
|==================================================================================================================|
| abcd | hf-virt-01 | 7030 | Unused | Connecting(hf-virt-02) | TieBreaker | 2023-02-27 13:28:56 |
| abcd | hf-virt-02 | 7030 | Unused | StandAlone(hf-virt-03,hf-virt-01) | Diskless | |
| abcd | hf-virt-03 | 7030 | Unused | Connecting(hf-virt-02) | UpToDate | 2023-02-27 13:28:46 |
+------------------------------------------------------------------------------------------------------------------+
# linstor error-reports show 63FCA19A-57D8E-000001
ERROR REPORT 63FCA19A-57D8E-000001
============================================================
Application: LINBIT�� LINSTOR
Module: Satellite
Version: 1.20.3
Build ID: 8d19a891df018f6e3d40538d809904f024bfe361
Build time: 2023-01-27T11:19:21+00:00
Error time: 2023-02-27 13:28:57
Node: hf-virt-02
============================================================
Reported error:
===============
Description:
Failed to adjust DRBD resource abcd
Category: LinStorException
Class name: ResourceException
Class canonical name: com.linbit.linstor.core.devmgr.exceptions.ResourceException
Generated at: Method 'adjustDrbd', Source file 'DrbdLayer.java', Line #834
Error message: Failed to adjust DRBD resource abcd
Error context:
An error occurred while processing resource 'Node: 'hf-virt-02', Rsc: 'abcd''
Call backtrace:
Method Native Class:Line number
adjustDrbd N com.linbit.linstor.layer.drbd.DrbdLayer:834
process N com.linbit.linstor.layer.drbd.DrbdLayer:396
process N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:901
processResourcesAndSnapshots N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:359
dispatchResources N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:169
dispatchResources N com.linbit.linstor.core.devmgr.DeviceManagerImpl:322
phaseDispatchDeviceHandlers N com.linbit.linstor.core.devmgr.DeviceManagerImpl:1152
devMgrLoop N com.linbit.linstor.core.devmgr.DeviceManagerImpl:750
run N com.linbit.linstor.core.devmgr.DeviceManagerImpl:644
run N java.lang.Thread:829
Caused by:
==========
Description:
Execution of the external command 'drbdadm' failed.
Cause:
The external command exited with error code 1.
Correction:
- Check whether the external program is operating properly.
- Check whether the command line is correct.
Contact a system administrator or a developer if the command line is no longer valid
for the installed version of the external program.
Additional information:
The full command line executed was:
drbdadm -vvv adjust abcd
The external command sent the following output data:
drbdsetup new-resource abcd 1 --on-no-quorum=io-error --quorum=majority
drbdsetup new-minor abcd 1023 0
drbdsetup new-peer abcd 2 --_name=hf-virt-01 --verify-alg=crct10dif-pclmul --shared-secret=kRLtYYrxX/usF3jZMBJg --cram-hmac-alg=sha1
drbdsetup new-peer abcd 0 --_name=hf-virt-03 --verify-alg=crct10dif-pclmul --shared-secret=kRLtYYrxX/usF3jZMBJg --cram-hmac-alg=sha1
drbdsetup new-path abcd 2 ipv4:95.217.77.33:7030 ipv4:95.217.77.109:7030
drbdsetup new-path abcd 0 ipv4:95.217.77.33:7030 ipv4:95.217.77.30:7030
drbdsetup peer-device-options abcd 2 0 --set-defaults --bitmap=no
drbdmeta 1023 v09 /dev/shared-lun/abcd_00000 internal apply-al
drbdsetup attach 1023 /dev/shared-lun/abcd_00000 /dev/shared-lun/abcd_00000 internal --discard-zeroes-if-aligned=no --rs-discard-granularity=8192
The external command sent the following error information:
New resource abcd
New minor 1023 (vol:0)
1023: Failure: (165) Unclean meta-data found.
You need to 'drbdadm apply-al res'
additional info from kernel:
Found unclean meta data. Did you "drbdadm apply-al"?
Command 'drbdsetup attach 1023 /dev/shared-lun/abcd_00000 /dev/shared-lun/abcd_00000 internal --discard-zeroes-if-aligned=no --rs-discard-granularity=8192' terminated with exit code 10
Category: LinStorException
Class name: ExtCmdFailedException
Class canonical name: com.linbit.extproc.ExtCmdFailedException
Generated at: Method 'execute', Source file 'DrbdAdm.java', Line #593
Error message: The external command 'drbdadm' exited with error code 1
Call backtrace:
Method Native Class:Line number
execute N com.linbit.linstor.layer.drbd.utils.DrbdAdm:593
adjust N com.linbit.linstor.layer.drbd.utils.DrbdAdm:90
adjustDrbd N com.linbit.linstor.layer.drbd.DrbdLayer:752
process N com.linbit.linstor.layer.drbd.DrbdLayer:396
process N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:901
processResourcesAndSnapshots N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:359
dispatchResources N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:169
dispatchResources N com.linbit.linstor.core.devmgr.DeviceManagerImpl:322
phaseDispatchDeviceHandlers N com.linbit.linstor.core.devmgr.DeviceManagerImpl:1152
devMgrLoop N com.linbit.linstor.core.devmgr.DeviceManagerImpl:750
run N com.linbit.linstor.core.devmgr.DeviceManagerImpl:644
run N java.lang.Thread:829
END OF ERROR REPORT.
I guess that drbd can't even work over shared LUN. I think we shouldn't allow creating the second diskful replica this case.
The proposed solution:
-
If:
create resource -s storage-poolis requested on storage-pool with shared space: then:- check if this storage-pool contains other diskful resource
- force to create diskless replica and show warning about that.
-
if:
toggle-disk -s storage-poolis requested on diskless resource on storage-pool with shared space then:- check if this storage-pool contains other diskful resource
- turn this diskful replica to diskless without removing backing LV
- turn diskless replica to diskful on requested node
-
if: one of diskless resources become primary then:
- freeze it
- automatically turn it to diskful, by the executing procedure above
- unfreeze it
Thanks for the notice.
I think we shouldn't allow creating the second diskful replica this case.
At least not an active one. The bug that I see here is that Linstor should have created the second diskful resource automatically with --inactive. That would lead to a situation where the second diskful node will NOT have the DRBD device at all (not even as diskless), but the user can linstor r deactivate the first diskful resource and linstor r activate the second resource to "move" the DRBD device to the second node.
Although I do like the diskless-dancing idea, at least your third point will most likely not be implemented, since that would require some "if this resource gets primary" hook within Linstor, which is not something we intended to do.
The other two points sound like good suggestions but might have some well hidden problems in the details (i.e. correctly managing the node-ids). However, we will think about them.
PS: now that I thought a bit longer about this issue, I think we also have to prohibit having 2 shared resources with different internal/external metadata settings, as that will cause definitely a lot of trouble.
your third point will most likely not be implemented, since that would require some "if this resource gets primary" hook within Linstor, which is not something we intended to do.
Good point, I think we can implement this additional call for toggle-disk in csi-driver.
This will allow us to handle the cases of recreating pods and live-migration of virtual machines the more smart way.
@ghernadi is it possible to perform this procedure by request if one of resouce is already InUse?
Eg, by invoking drbdadm disconnect with --force, then toggling it into diskful?
if: toggle-disk -s storage-pool is requested on diskless resource on storage-pool with shared space then:
- check if this storage-pool contains other diskful resource
- turn this diskful replica to diskless without removing backing LV
- turn diskless replica to diskful on requested node
@ghernadi is it possible to perform this procedure by request if one of resouce is already InUse? Eg, by invoking drbdadm disconnect with --force, then toggling it into diskful?
Honestly, I would like to avoid using disconnect --force as that could quite easily lead to unintended behavior.
So sad, I don't see any option to perform the live-migration for VMs in the shared-pool and keep data locality for them :(
Okay, I made a bit investigation and found that resources with external metadata could have more than one diskful drbd replica on shared-lun. I'm not sure how this configuration is dangerous, since all the data will be written on same device twice but it works with no problems. In theory protocol C should make it safe.
Considering the fact that this is the only possible way to live-migrate the vm on shared-lun and keep data-locality, I would suggest the following changes:
- do not allow using internal metadata on shared pool at all (this also covers the case when user requested various layer stack for two resources on same shared pool)
- do not block creation of more than one diskful drbd replica on shared pool, but show the warning: "Having two diskful replicas will double the number of write requests on backing block device"
Ideally I would love to have the opportunity from DRBD to work with shared meta-disk, or at least suppress write requests between the nodes with shared data-disk. This way all the replicas on shared-pool could be diskful (as they really are)
Or some option to explain DRBD that data between two peers is always Consistent to not perform actual synchronization between them