linstor-server icon indicating copy to clipboard operation
linstor-server copied to clipboard

Resources stuck on DELETING

Open kvaps opened this issue 6 years ago • 23 comments

I have some problem with removing my resources. I presume it may be same problem connected with long lvm output https://github.com/LINBIT/linstor-server/issues/8 Any way I can't delete my resources even after correcting global_filter for my lvm.conf

satellite's log:

23:49:44.854 [MainWorkerPool-7] ERROR LINSTOR/Satellite - Problem of type 'java.lang.NullPointerException' logged to report number 5BAFAC32-CDA0B-000000

23:49:44.863 [MainWorkerPool-7] ERROR LINSTOR/Satellite - Access to deleted resource [Report number 5BAFAC32-CDA0B-000001]

ErrorReport-5BAFAC32-CDA0B-000000.log ErrorReport-5BAFAC32-CDA0B-000001.log

This problem occurs quite often for my installation.

kvaps avatar Oct 02 '18 07:10 kvaps

After restarting it is still stuck, but now another error, lvm timed out:

The most interesting, that data/tgt108_00000 already not existing in my system

ErrorReport-5BB322C9-CDA0B-000000.log ErrorReport-5BB322C9-CDA0B-000001.log

kvaps avatar Oct 02 '18 07:10 kvaps

I will look into this, but can you please re-upload ErrorReport-5BAFAC32-CDA0B-000000.log, as it has the same content as ErrorReport-5BAFAC32-CDA0B-000001.log

ghernadi avatar Oct 02 '18 07:10 ghernadi

@ghernadi, sure, there is logs from another node, with the same problem:

ErrorReport-5BAF8C77-12FF9-000000.log ErrorReport-5BAF8C77-12FF9-000001.log

kvaps avatar Oct 02 '18 08:10 kvaps

Alright, the NullPointerException is already fixed, included in the next release.

The "Access to deleted resource" is most likely a consequence of the previous NullPointerException. However, the NullPointerException triggered this behaviour, but there is definitely a nasty bug in the TransactionMgr which we have to fix. In other words: Nice catch, thank you for finding this :)

Regarding the timeout of lvremove ... command - our default timeouts are 45 seconds as far as I remember. The other issue you referenced was that the satellite disconnects when any external command takes longer than 5 seconds (which is allowed, even by linstor's assumptions). This 5-second-timeout-disconnect bug is fixed. However, I'm not sure what we can do about the 45 second lvremove ... issue, other than increase linstor's timeout which is not really a preferable choice...

ghernadi avatar Oct 02 '18 08:10 ghernadi

This 5-second-timeout-disconnect bug is fixed.

In the current release? I've tried to repeeat lvremove -f data/tgt108_00000 manually, it was executing little more than 5 second.

Nice catch, thank you for finding this

I like linstor and drbd, and I'm glad to help you, thanks for your work!

kvaps avatar Oct 02 '18 08:10 kvaps

In the current release?

No, sorry, it is fixed internally and will be in the next release...

ghernadi avatar Oct 02 '18 08:10 ghernadi

Hi, we just faced with the same situation (on 1.7.1):

# linstor r l -r pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b pvc-f8aa2cba-8784-4a21-b8c1-fc192272d8d9 pvc-7fbdf685-39b0-43b9-80f3-a2bbe0e144a4 -a
+-------------------------------------------------------------------------------------------------+
| ResourceName                             | Node  | Port | Usage  | Conns             |    State |
|=================================================================================================|
| pvc-7fbdf685-39b0-43b9-80f3-a2bbe0e144a4 | m1c12 | 7426 | Unused | Ok                | UpToDate |
| pvc-7fbdf685-39b0-43b9-80f3-a2bbe0e144a4 | m1c6  | 7426 | Unused | Connecting(m1c10) | UpToDate |
| pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b | m1c12 | 7350 | Unused | Ok                | UpToDate |
| pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b | m1c6  | 7350 | Unused | Connecting(m1c5)  | UpToDate |
| pvc-f8aa2cba-8784-4a21-b8c1-fc192272d8d9 | m1c12 | 7365 | InUse  | Ok                | UpToDate |
| pvc-f8aa2cba-8784-4a21-b8c1-fc192272d8d9 | m1c4  | 7365 | Unused | Ok                | UpToDate |
+-------------------------------------------------------------------------------------------------+
# linstor r d m1c5 pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b
SUCCESS:
Description:
    Node: m1c5, Resource: pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b marked for deletion.
Details:
    Node: m1c5, Resource: pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b UUID is: 48e29d6f-3f80-4615-97a6-2a81b1af0b64
SUCCESS:
    Deleted 'pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b' on 'm1c5'
ERROR:
Description:
    (Node: 'm1c6') Creation of the DRBD configuration file for resource 'pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b' failed due to an I/O error
Cause:
    Creation of the DRBD configuration file failed due to an I/O error
Correction:
    - Check whether enough free space is available for the creation of the file
    - Check whether the application has write access to the target directory
    - Check whether the storage is operating flawlessly
Details:
    The error reported by the runtime environment or operating system is:
    No space left on device
Show reports:
    linstor error-reports show 5EBE5356-85D84-000080
SUCCESS:
    Notified 'm1c12' that 'pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b' is being deleted on Node(s): [m1c5]
command terminated with exit code 10
# linstor error-reports show 5EBE5356-85D84-000080
None
# linstor v l -r pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b pvc-f8aa2cba-8784-4a21-b8c1-fc192272d8d9 pvc-7fbdf685-39b0-43b9-80f3-a2bbe0e144a4 -a
+-------------------------------------------------------------------------------------------------------------------------------------------+
| Node  | Resource                                 | StoragePool          | VolNr | MinorNr | DeviceName    | Allocated | InUse  |    State |
|===========================================================================================================================================|
| m1c10 | pvc-7fbdf685-39b0-43b9-80f3-a2bbe0e144a4 | DfltDisklessStorPool |     0 |    1446 | /dev/drbd1446 |           |        |  Unknown |
| m1c12 | pvc-7fbdf685-39b0-43b9-80f3-a2bbe0e144a4 | thindata             |     0 |    1446 | /dev/drbd1446 |  1.41 GiB | Unused | UpToDate |
| m1c6  | pvc-7fbdf685-39b0-43b9-80f3-a2bbe0e144a4 | thindata             |     0 |    1446 | /dev/drbd1446 |  1.41 GiB | Unused | UpToDate |
| m1c12 | pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b | thindata             |     0 |    1350 | /dev/drbd1350 |  1.40 GiB | Unused | UpToDate |
| m1c5  | pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b | DfltDisklessStorPool |     0 |    1350 | /dev/drbd1350 |           |        |  Unknown |
| m1c6  | pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b | thindata             |     0 |    1350 | /dev/drbd1350 |  1.40 GiB | Unused | UpToDate |
| m1c12 | pvc-f8aa2cba-8784-4a21-b8c1-fc192272d8d9 | thindata             |     0 |    1365 | /dev/drbd1365 |  3.14 GiB | InUse  | UpToDate |
| m1c4  | pvc-f8aa2cba-8784-4a21-b8c1-fc192272d8d9 | thindata             |     0 |    1365 | /dev/drbd1365 |  3.15 GiB | Unused | UpToDate |
+-------------------------------------------------------------------------------------------------------------------------------------------+

The error shows No space left on device but all nodes have enough. ErrorReport is not generated for some reson.

kvaps avatar May 18 '20 14:05 kvaps

my bad No space left on device was meaning rootfs but not a storagepool itself, it was overfilled on m1c6, but anyway I guessing you'd like to handle this error more correct in the future.

kvaps avatar May 18 '20 14:05 kvaps

Hehe, I guess linstor was also not able to generate the ErrorReport due to No space left on device? :)

But yes, I will think about what we could do in such a situation :)

ghernadi avatar May 18 '20 14:05 ghernadi

It used to write to stderr if writing the error report fails due to an IOException, so if that still works, it might be in a journal entry or syslog entry. Provided that those were not out of storage space as well...

raltnoeder avatar Jun 30 '20 10:06 raltnoeder

Hi, I just faced with the similar problem on another installation,

╭───────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node     ┊ Port ┊ Usage ┊ Conns ┊    State ┊ CreatedOn ┊
╞═══════════════════════════════════════════════════════════════════════╡
┊ test2        ┊ proxmox1 ┊ 7077 ┊       ┊ Ok    ┊ DELETING ┊           ┊
╰───────────────────────────────────────────────────────────────────────╯

this command just stuck:

linstor r d proxmox1 test2

tace log:

14:03:25.527 [grizzly-http-server-5] DEBUG LINSTOR/Controller - SYSTEM - REST access api 'DelRsc' from 'RestClient(10.28.28.76; 'PythonLinstor/1.4.2 (API1.0.4)')'
14:03:25.527 [grizzly-http-server-5] TRACE LINSTOR/Controller - SYSTEM - Peer RestClient(10.28.28.76; 'PythonLinstor/1.4.2 (API1.0.4)'), Background operation 'DelRsc' scope 'Delete resource' start
14:03:25.528 [grizzly-http-server-5] TRACE LINSTOR/Controller - SYSTEM - Updating RESOURCES's flags from [DELETE] to [DELETE] (NodeName=proxmox1 ResName=test2)
14:03:25.528 [grizzly-http-server-5] TRACE LINSTOR/Controller - SYSTEM - RESOURCES's flags updated from [DELETE] to [DELETE] (NodeName=proxmox1 ResName=test2)
14:03:25.528 [grizzly-http-server-5] TRACE LINSTOR/Controller - SYSTEM - Updating VOLUMES's flags from [DELETE] to [DELETE] (NodeName=proxmox1 ResName=test2 VolNum=0)
14:03:25.529 [grizzly-http-server-5] TRACE LINSTOR/Controller - SYSTEM - VOLUMES's flags updated from [DELETE] to [DELETE] (NodeName=proxmox1 ResName=test2 VolNum=0)
14:03:25.529 [grizzly-http-server-5] TRACE LINSTOR/Controller - SYSTEM - Peer RestClient(10.28.28.76; 'PythonLinstor/1.4.2 (API1.0.4)'), Background operation 'DelRsc' scope 'Delete resource' end
14:03:25.529 [grizzly-http-server-5] TRACE LINSTOR/Controller - SYSTEM - Peer RestClient(10.28.28.76; 'PythonLinstor/1.4.2 (API1.0.4)'), Background operation 'DelRsc' scope 'Update for resource deletion' start
14:03:25.529 [grizzly-http-server-5] TRACE LINSTOR/Controller - SYSTEM - Peer RestClient(10.28.28.76; 'PythonLinstor/1.4.2 (API1.0.4)'), Background operation 'DelRsc' scope 'Update for resource deletion' end
14:03:25.530 [grizzly-http-server-5] TRACE LINSTOR/Controller - SYSTEM - Peer Node: 'proxmox1', API call 87 'ChangedRsc' send
14:03:25.531 [MainWorkerPool-7] TRACE LINSTOR/Controller - SYSTEM - Peer Node: 'proxmox1', API call 87 answer received (seq 338)

kvaps avatar Jan 22 '21 13:01 kvaps

UPD: After the manual removal via lvremove. I just got an interesting error in linstor, with no error report:

# linstor v l -r test2
╭──────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node     ┊ Resource ┊ StoragePool ┊ VolNr ┊ MinorNr ┊ DeviceName ┊ Allocated ┊ InUse ┊ State ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════╡
┊ proxmox1 ┊ test2    ┊ drbdpool    ┊     0 ┊    1074 ┊ None       ┊           ┊       ┊ Error ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────╯
ERROR:
Description:
    Node: 'proxmox1', resource: 'test2', volume: 0 - Device provider threw a storage exception
Cause:
    The volume could not be found on the system.

kvaps avatar Jan 22 '21 13:01 kvaps

Creation of new resource is also stuck:

linstor rd c test4
linstor vd create test4 1G
linstor r c proxmox1 test4  -s drbdpool

controller trace log:

satellite trace log:

kvaps avatar Jan 22 '21 13:01 kvaps

Probably related stack trace:

Jan 22 15:18:04 proxmox1 Satellite[21162]: 15:18:04.607 [MainWorkerPool-11] INFO  LINSTOR/Satellite - SYSTEM - Resource 'test5' created for node 'proxmox1'.
Jan 22 15:18:05 proxmox1 Satellite[21162]: Exception in thread "DeviceManager" java.lang.NoSuchMethodError: java.nio.ByteBuffer.limit(I)Ljava/nio/ByteBuffer;
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.layer.drbd.utils.MdSuperblockBuffer.<clinit>(MdSuperblockBuffer.java:40)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.layer.storage.WipeHandler.quickWipe(WipeHandler.java:47)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.layer.storage.AbsStorageProvider.createVolumes(AbsStorageProvider.java:426)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.layer.storage.AbsStorageProvider.process(AbsStorageProvider.java:299)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.layer.storage.StorageLayer.process(StorageLayer.java:281)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.core.devmgr.DeviceHandlerImpl.process(DeviceHandlerImpl.java:789)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.layer.drbd.DrbdLayer.processChild(DrbdLayer.java:452)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.layer.drbd.DrbdLayer.adjustDrbd(DrbdLayer.java:568)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.layer.drbd.DrbdLayer.process(DrbdLayer.java:387)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.core.devmgr.DeviceHandlerImpl.process(DeviceHandlerImpl.java:789)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.core.devmgr.DeviceHandlerImpl.processResourcesAndSnapshots(DeviceHandlerImpl.java:335)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.core.devmgr.DeviceHandlerImpl.dispatchResources(DeviceHandlerImpl.java:145)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.core.devmgr.DeviceManagerImpl.dispatchResources(DeviceManagerImpl.java:268)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.core.devmgr.DeviceManagerImpl.phaseDispatchDeviceHandlers(DeviceManagerImpl.java:924)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.core.devmgr.DeviceManagerImpl.devMgrLoop(DeviceManagerImpl.java:627)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.core.devmgr.DeviceManagerImpl.run(DeviceManagerImpl.java:545)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at java.lang.Thread.run(Thread.java:748)

kvaps avatar Jan 22 '21 14:01 kvaps

fixed by swaping java 1.8 to 1.11

ln -sf /usr/lib/jvm/java-1.11.0-openjdk-amd64/bin/java /etc/alternatives/java

kvaps avatar Jan 22 '21 14:01 kvaps

Hi!

The trace logs from https://github.com/LINBIT/linstor-server/issues/19#issuecomment-765385008 does not really help as that shows only that the controller set the DELETE flags on rsc and vlm, but not why the satellite does not respond to the update..

Regarding "no error report" from https://github.com/LINBIT/linstor-server/issues/19#issuecomment-765385332, even if Linstor does not tell you "here is the error report", you might still want to check linstor err list, as that will query the satellite for new error reports. If an error report is generated, it should usually be included in the client answer, so we might want to look into this.

https://github.com/LINBIT/linstor-server/issues/19#issuecomment-765405153 again - ErrorReports would be helpful here. Also the satellite-trace.log shows that you have quite a lot of *-deleted LVs. I guess those are from previous tests that somehow failed or got stuck as you described before. You might want to clean them up manually.

Regarding Java version 8 -> 11 from the last 2 comments.. Looks like there are some incompatibilities if a Java class was compiled with 9+ and being run with 8... Apparently Linstor was not compiled with Java 8 (or at least with 8 as target) before packaging. we will look into this, thanks for the hint!

ghernadi avatar Jan 25 '21 06:01 ghernadi

How do you get your packages? self build?

rp- avatar Jan 25 '21 07:01 rp-

@rp- this was proxmox-5 cluster upgraded to proxmox-6, linstor was installed directly from linbit's repository:

# cat /etc/apt/sources.list.d/linbit.list 
deb http://packages.linbit.com/proxmox/ proxmox-6 drbd-9.0

Unfortunately, I've got an access to it already after problem has occurred. The system update and update of linstor components - this is the first thing which I've done there.

kvaps avatar Jan 25 '21 09:01 kvaps

@ghernadi unfortunately I have not found any additional error reports :-(

kvaps avatar Jan 25 '21 09:01 kvaps

Just faced this again:

# linstor r l -r pvc-484e23e7-f6a5-4d16-a46e-41620f8c4a2d
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName                             ┊ Node       ┊ Port ┊ Usage  ┊ Conns ┊    State ┊ CreatedOn           ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-484e23e7-f6a5-4d16-a46e-41620f8c4a2d ┊ hf-virt-01 ┊ 7008 ┊        ┊ Ok    ┊ DELETING ┊ 2023-01-09 08:49:06 ┊
┊ pvc-484e23e7-f6a5-4d16-a46e-41620f8c4a2d ┊ hf-virt-02 ┊ 7008 ┊ Unused ┊       ┊  Unknown ┊                     ┊
┊ pvc-484e23e7-f6a5-4d16-a46e-41620f8c4a2d ┊ hf-virt-03 ┊ 7008 ┊ Unused ┊       ┊  Unknown ┊                     ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
# linstor rd d pvc-484e23e7-f6a5-4d16-a46e-41620f8c4a2d
SUCCESS:
Description:
    Resource definition 'pvc-484e23e7-f6a5-4d16-a46e-41620f8c4a2d' marked for deletion.
Details:
    Resource definition 'pvc-484e23e7-f6a5-4d16-a46e-41620f8c4a2d' UUID is: 51dc6292-36c8-4eb9-91b0-495f507b8ea6
SUCCESS:
    Notified 'hf-virt-01' that diskless resources of 'pvc-484e23e7-f6a5-4d16-a46e-41620f8c4a2d' are being deleted
ERROR:
    (Node: 'hf-virt-02') An unknown exception occurred while processing the resource pvc-484e23e7-f6a5-4d16-a46e-41620f8c4a2d
Show reports:
    linstor error-reports show 63BA4220-57D8E-000054
ERROR:
    (Node: 'hf-virt-03') An unknown exception occurred while processing the resource pvc-484e23e7-f6a5-4d16-a46e-41620f8c4a2d
Show reports:
    linstor error-reports show 63BA421E-94200-000063
command terminated with exit code 10
ERROR REPORT 63BA4220-57D8E-000054

============================================================

Application:                        LINBIT�� LINSTOR
Module:                             Satellite
Version:                            1.20.0
Build ID:                           9c6f7fad48521899f7a99c564b1d33aeacfdbfa8
Build time:                         2022-11-07T16:37:38+00:00
Error time:                         2023-01-09 08:53:29
Node:                               hf-virt-02

============================================================

Reported error:
===============

Category:                           RuntimeException
Class name:                         NullPointerException
Class canonical name:               java.lang.NullPointerException
Generated at:                       Method 'start', Source file 'ProcessBuilder.java', Line #1090


Error context:
    An error occurred while processing resource 'Node: 'hf-virt-02', Rsc: 'pvc-484e23e7-f6a5-4d16-a46e-41620f8c4a2d''

Call backtrace:

    Method                                   Native Class:Line number
    start                                    N      java.lang.ProcessBuilder:1090
    start                                    N      java.lang.ProcessBuilder:1071
    exec                                     N      com.linbit.extproc.ExtCmd:128
    exec                                     N      com.linbit.extproc.ExtCmd:89
    getCurrentGID                            N      com.linbit.linstor.layer.drbd.utils.DrbdAdm:318
    allVlmsMetaDataNew                       N      com.linbit.linstor.layer.drbd.DrbdLayer:1738
    condInitialOrSkipSync                    N      com.linbit.linstor.layer.drbd.DrbdLayer:1642
    adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:814
    process                                  N      com.linbit.linstor.layer.drbd.DrbdLayer:396
    process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:900
    processResourcesAndSnapshots             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:358
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:168
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:309
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1083
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:735
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:631
    run                                      N      java.lang.Thread:829


END OF ERROR REPORT.
ERROR REPORT 63BA421E-94200-000063

============================================================

Application:                        LINBIT�� LINSTOR
Module:                             Satellite
Version:                            1.20.0
Build ID:                           9c6f7fad48521899f7a99c564b1d33aeacfdbfa8
Build time:                         2022-11-07T16:37:38+00:00
Error time:                         2023-01-09 08:53:29
Node:                               hf-virt-03

============================================================

Reported error:
===============

Category:                           RuntimeException
Class name:                         NullPointerException
Class canonical name:               java.lang.NullPointerException
Generated at:                       Method 'start', Source file 'ProcessBuilder.java', Line #1090


Error context:
    An error occurred while processing resource 'Node: 'hf-virt-03', Rsc: 'pvc-484e23e7-f6a5-4d16-a46e-41620f8c4a2d''

Call backtrace:

    Method                                   Native Class:Line number
    start                                    N      java.lang.ProcessBuilder:1090
    start                                    N      java.lang.ProcessBuilder:1071
    exec                                     N      com.linbit.extproc.ExtCmd:128
    exec                                     N      com.linbit.extproc.ExtCmd:89
    getCurrentGID                            N      com.linbit.linstor.layer.drbd.utils.DrbdAdm:318
    allVlmsMetaDataNew                       N      com.linbit.linstor.layer.drbd.DrbdLayer:1738
    condInitialOrSkipSync                    N      com.linbit.linstor.layer.drbd.DrbdLayer:1642
    adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:814
    process                                  N      com.linbit.linstor.layer.drbd.DrbdLayer:396
    process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:900
    processResourcesAndSnapshots             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:358
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:168
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:309
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1083
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:735
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:631
    run                                      N      java.lang.Thread:829


END OF ERROR REPORT.

kvaps avatar Jan 09 '23 08:01 kvaps

is the drbdmeta binary accessible on that node?

rp- avatar Jan 09 '23 09:01 rp-

It is accessible In pod with linstor-satellite

kvaps avatar Jan 09 '23 11:01 kvaps

While it could be triggered by an installation problem, the fact that it is a NullPointerException means that there must be a programming error somewhere on the code path too. It seems that somewhere in the generation of the command to be executed by ExtCmd, either the entire command arguments array or one of its elements is a null reference, because that's what's being handled in ProcessBuilder around line 1090.

raltnoeder avatar Jan 12 '23 13:01 raltnoeder