linstor-gateway icon indicating copy to clipboard operation
linstor-gateway copied to clipboard

Timeout creating large(?) nfs export on slow (hdd) storage

Open marcellinus77 opened this issue 9 months ago • 1 comments

While trying to create a large export on HDD backed storage I get this error report:

linstor-gateway nfs create ls-nfs 1.2.3.4/24 512G -r pve-hdd -f ext4

ERROR REPORT 67E9C422-A21AF-000007

============================================================

Application:                        LINBIT® LINSTOR
Module:                             Satellite
Version:                            1.30.4
Build ID:                           bef74a44609cb592c5efad2e707b50e696623c61
Build time:                         2025-02-03T15:48:28+00:00
Error time:                         2025-03-31 01:40:14
Node:                               node-4
Thread:                             DeviceManager

============================================================

Reported error:
===============

Category:                           LinStorException
Class name:                         StorageException
Class canonical name:               com.linbit.linstor.storage.StorageException
Generated at:                       Method 'genericExecutor', Source file 'Commands.java', Line #120

Error message:                      Failed to mfks /dev/drbd1026

Error context:
        An error occurred while processing resource 'Node: 'node-4', Rsc: 'ls-nfs''
ErrorContext:
  Cause:       External command timed out
  Details:     External command: mkfs.ext4 -q -E nodiscard -E root_owner=65534:65534 /dev/drbd1026


Call backtrace:

    Method                                   Native Class:Line number
    genericExecutor                          N      com.linbit.linstor.storage.utils.Commands:120
    genericExecutor                          N      com.linbit.linstor.storage.utils.Commands:63
    genericExecutor                          N      com.linbit.linstor.storage.utils.Commands:51
    makeFs                                   N      com.linbit.linstor.storage.utils.MkfsUtils:96
    makeExt4                                 N      com.linbit.linstor.storage.utils.MkfsUtils:109
    makeFileSystemOnMarked                   N      com.linbit.linstor.storage.utils.MkfsUtils:222
    condInitialOrSkipSync                    N      com.linbit.linstor.layer.drbd.DrbdLayer:1714
    adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:743
    processResource                          N      com.linbit.linstor.layer.drbd.DrbdLayer:249
    lambda$processResource$4                 N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:1368
    processGeneric                           N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:1411
    processResource                          N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:1364
    processResources                         N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:386
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:228
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:333
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1148
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:778
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:674
    run                                      N      java.lang.Thread:840

Caused by:
==========

Category:                           Exception
Class name:                         ChildProcessTimeoutException
Class canonical name:               com.linbit.ChildProcessTimeoutException
Generated at:                       Method 'waitFor', Source file 'ChildProcessHandler.java', Line #133


Call backtrace:

    Method                                   Native Class:Line number
    waitFor                                  N      com.linbit.extproc.ChildProcessHandler:133
    syncProcess                              N      com.linbit.extproc.ExtCmd:160
    exec                                     N      com.linbit.extproc.ExtCmd:92
    genericExecutor                          N      com.linbit.linstor.storage.utils.Commands:79
    genericExecutor                          N      com.linbit.linstor.storage.utils.Commands:63
    genericExecutor                          N      com.linbit.linstor.storage.utils.Commands:51
    makeFs                                   N      com.linbit.linstor.storage.utils.MkfsUtils:96
    makeExt4                                 N      com.linbit.linstor.storage.utils.MkfsUtils:109
    makeFileSystemOnMarked                   N      com.linbit.linstor.storage.utils.MkfsUtils:222
    condInitialOrSkipSync                    N      com.linbit.linstor.layer.drbd.DrbdLayer:1714
    adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:743
    processResource                          N      com.linbit.linstor.layer.drbd.DrbdLayer:249
    lambda$processResource$4                 N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:1368
    processGeneric                           N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:1411
    processResource                          N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:1364
    processResources                         N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:386
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:228
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:333
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1148
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:778
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:674
    run                                      N      java.lang.Thread:840


END OF ERROR REPORT.

however,

linstor-gateway nfs create ls-nfs 1.2.3.4/24 512G -r pve-hdd -f xfs

is completing successfully. So I assume it really is a timeout issue, though.

marcellinus77 avatar Mar 30 '25 23:03 marcellinus77

The timeout on the mkfs command is 45 seconds, which is pretty long for "only" a 512G disk. Some debugging questions to figure this out:

  • How long does the second command (nfs create ... -f xfs) take?
  • What about if you do a mkfs.ext4 on the raw disks, without LINSTOR and DRBD involved? How long does that take?
  • If that completes relatively quickly, try a mkfs.ext4 with -E nodiscard to see if that has an effect.
  • What is the network latency and bandwidth between these nodes?

Thanks!

chrboe avatar Apr 01 '25 10:04 chrboe