piraeus-operator
piraeus-operator copied to clipboard
StorageException: Failed to mkfs /dev/drbd1002
After installing piraeus-operator I get the error message StorageException: Failed to mkfs /dev/drbd1002
.
Kubernetes version: v1.28.8
Priaeus operator: v2.3.0
Piraeus server: v1.25.1
Linstor is installed with the following satellite configuration
---
apiVersion: piraeus.io/v1
kind: LinstorSatelliteConfiguration
metadata:
name: linstor-fast
spec:
internalTLS:
certManager:
name: linstor-internal-ca
kind: Issuer
storagePools:
- name: vg01-linstor
lvmThinPool:
volumeGroup: vg01
thinPool: linstor
After installation I get a number of errors:
$ kubectl exec -ti -n piraeus-datastore deploy/linstor-controller -- linstor error-reports list
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Id ┊ Datetime ┊ Node ┊ Exception ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ 6610156F-8EC88-000000 ┊ 2024-04-05 15:15:30 ┊ S|k8s-m2 ┊ StorageException: Failed to mkfs /dev/drbd1002 ┊
┊ 66101520-00000-000000 ┊ 2024-04-05 15:15:32 ┊ C|linstor-controller-5f594b5b45-9lr8z ┊ ApiRcException: Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on no... ┊
┊ 66101520-00000-000001 ┊ 2024-04-05 15:15:35 ┊ C|linstor-controller-5f594b5b45-9lr8z ┊ ApiRcException: Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on no... ┊
┊ 66101520-00000-000002 ┊ 2024-04-05 15:15:42 ┊ C|linstor-controller-5f594b5b45-9lr8z ┊ ApiRcException: Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on no... ┊
┊ 66101589-E5863-000000 ┊ 2024-04-05 15:15:52 ┊ S|k8s-m0 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000001 ┊ 2024-04-05 15:15:52 ┊ S|k8s-m2 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101520-00000-000003 ┊ 2024-04-05 15:15:52 ┊ C|linstor-controller-5f594b5b45-9lr8z ┊ ApiRcException: Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on no... ┊
┊ 66101589-E5863-000001 ┊ 2024-04-05 15:15:59 ┊ S|k8s-m0 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000002 ┊ 2024-04-05 15:16:04 ┊ S|k8s-m2 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000002 ┊ 2024-04-05 15:16:08 ┊ S|k8s-m0 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000003 ┊ 2024-04-05 15:16:08 ┊ S|k8s-m2 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101520-00000-000004 ┊ 2024-04-05 15:16:09 ┊ C|linstor-controller-5f594b5b45-9lr8z ┊ ApiRcException: (Node: 'k8s-m2') Generated resource file for resource 'pv... ┊
┊ 66101520-00000-000005 ┊ 2024-04-05 15:16:09 ┊ C|linstor-controller-5f594b5b45-9lr8z ┊ ApiRcException: Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on no... ┊
┊ 6610156F-8EC88-000004 ┊ 2024-04-05 15:16:12 ┊ S|k8s-m2 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000003 ┊ 2024-04-05 15:16:12 ┊ S|k8s-m0 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000005 ┊ 2024-04-05 15:16:24 ┊ S|k8s-m2 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000006 ┊ 2024-04-05 15:16:43 ┊ S|k8s-m2 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000004 ┊ 2024-04-05 15:16:43 ┊ S|k8s-m0 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000007 ┊ 2024-04-05 15:16:44 ┊ S|k8s-m2 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000008 ┊ 2024-04-05 15:17:42 ┊ S|k8s-m2 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000005 ┊ 2024-04-05 15:17:42 ┊ S|k8s-m0 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 661015A1-A3732-000000 ┊ 2024-04-05 15:18:59 ┊ S|k8s-m1 ┊ SSLException: closing inbound before receiving peer's close_notify ┊
┊ 6610156F-8EC88-000009 ┊ 2024-04-05 15:18:59 ┊ S|k8s-m2 ┊ SSLException: closing inbound before receiving peer's close_notify ┊
┊ 661015A1-A3732-000001 ┊ 2024-04-05 15:18:59 ┊ S|k8s-m1 ┊ SSLException: closing inbound before receiving peer's close_notify ┊
┊ 66101589-E5863-000006 ┊ 2024-04-05 15:19:00 ┊ S|k8s-m0 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000010 ┊ 2024-04-05 15:19:00 ┊ S|k8s-m2 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000011 ┊ 2024-04-05 15:19:01 ┊ S|k8s-m2 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000007 ┊ 2024-04-05 15:19:01 ┊ S|k8s-m0 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000012 ┊ 2024-04-05 15:19:27 ┊ S|k8s-m2 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000008 ┊ 2024-04-05 15:19:27 ┊ S|k8s-m0 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000013 ┊ 2024-04-05 15:19:57 ┊ S|k8s-m2 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000009 ┊ 2024-04-05 15:19:57 ┊ S|k8s-m0 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000010 ┊ 2024-04-05 15:20:57 ┊ S|k8s-m0 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000014 ┊ 2024-04-05 15:20:57 ┊ S|k8s-m2 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000011 ┊ 2024-04-05 15:22:57 ┊ S|k8s-m0 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000015 ┊ 2024-04-05 15:22:57 ┊ S|k8s-m2 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000016 ┊ 2024-04-05 15:23:14 ┊ S|k8s-m2 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000017 ┊ 2024-04-05 15:27:58 ┊ S|k8s-m2 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000012 ┊ 2024-04-05 15:27:58 ┊ S|k8s-m0 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000018 ┊ 2024-04-05 15:37:57 ┊ S|k8s-m2 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000013 ┊ 2024-04-05 15:37:57 ┊ S|k8s-m0 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000014 ┊ 2024-04-05 16:07:57 ┊ S|k8s-m0 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000019 ┊ 2024-04-05 16:07:57 ┊ S|k8s-m2 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000020 ┊ 2024-04-05 17:07:57 ┊ S|k8s-m2 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000015 ┊ 2024-04-05 17:07:57 ┊ S|k8s-m0 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000021 ┊ 2024-04-05 21:07:57 ┊ S|k8s-m2 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000016 ┊ 2024-04-05 21:07:57 ┊ S|k8s-m0 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000022 ┊ 2024-04-06 21:07:57 ┊ S|k8s-m2 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000017 ┊ 2024-04-06 21:07:57 ┊ S|k8s-m0 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 66101589-E5863-000018 ┊ 2024-04-07 21:07:57 ┊ S|k8s-m0 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
┊ 6610156F-8EC88-000023 ┊ 2024-04-07 21:07:57 ┊ S|k8s-m2 ┊ StorageException: Generated resource file for resource 'pvc-80745669-9bf4-4... ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Here are the error reports:
-
StorageException: Failed to mkfs /dev/drbd1002
ERROR REPORT 6610156F-8EC88-000000
============================================================
Application: LINBIT�� LINSTOR
Module: Satellite
Version: 1.25.1
Build ID: 918d21837aefab23c28a52e8fcb0af14033d9bcb
Build time: 2023-11-20T10:09:08+00:00
Error time: 2024-04-05 15:15:30
Node: k8s-m2
============================================================
Reported error:
===============
Description:
Failed to mkfs /dev/drbd1002
Additional information:
Command 'mkfs.ext4 -q -E nodiscard /dev/drbd1002' returned with exitcode 1.
Standard out:
Error message:
The file /dev/drbd1002 does not exist and no size was specified.
Category: LinStorException
Class name: StorageException
Class canonical name: com.linbit.linstor.storage.StorageException
Generated at: Method 'checkExitCode', Source file 'ExtCmdUtils.java', Line #69
Error message: Failed to mkfs /dev/drbd1002
Error context:
An error occurred while processing resource 'Node: 'k8s-m2', Rsc: 'pvc-80745669-9bf4-4776-9865-f6f419c57863''
ErrorContext: Details: Command 'mkfs.ext4 -q -E nodiscard /dev/drbd1002' returned with exitcode 1.
Standard out:
Error message:
The file /dev/drbd1002 does not exist and no size was specified.
Call backtrace:
Method Native Class:Line number
checkExitCode N com.linbit.extproc.ExtCmdUtils:69
genericExecutor N com.linbit.linstor.layer.storage.utils.Commands:103
genericExecutor N com.linbit.linstor.layer.storage.utils.Commands:63
genericExecutor N com.linbit.linstor.layer.storage.utils.Commands:51
makeFs N com.linbit.linstor.layer.storage.utils.MkfsUtils:96
makeExt4 N com.linbit.linstor.layer.storage.utils.MkfsUtils:109
makeFileSystemOnMarked N com.linbit.linstor.layer.storage.utils.MkfsUtils:222
condInitialOrSkipSync N com.linbit.linstor.layer.drbd.DrbdLayer:1771
adjustDrbd N com.linbit.linstor.layer.drbd.DrbdLayer:889
process N com.linbit.linstor.layer.drbd.DrbdLayer:432
process N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:938
processResourcesAndSnapshots N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:383
dispatchResources N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:181
dispatchResources N com.linbit.linstor.core.devmgr.DeviceManagerImpl:328
phaseDispatchDeviceHandlers N com.linbit.linstor.core.devmgr.DeviceManagerImpl:1156
devMgrLoop N com.linbit.linstor.core.devmgr.DeviceManagerImpl:756
run N com.linbit.linstor.core.devmgr.DeviceManagerImpl:650
run N java.lang.Thread:829
END OF ERROR REPORT.
-
Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on node 'k8s-m2' is still in use.
$ kubectl exec -ti -n piraeus-datastore deploy/linstor-controller -- linstor error-reports show 66101520-00000-000000
ERROR REPORT 66101520-00000-000000
============================================================
Application: LINBIT�� LINSTOR
Module: Controller
Version: 1.25.1
Build ID: 918d21837aefab23c28a52e8fcb0af14033d9bcb
Build time: 2023-11-20T10:09:08+00:00
Error time: 2024-04-05 15:15:32
Node: linstor-controller-5f594b5b45-9lr8z
Peer: RestClient(10.244.42.135; 'linstor-csi/v1.3.0-4077ebefbe439ee2894b782aa7914b590891d2ff')
============================================================
Reported error:
===============
Category: RuntimeException
Class name: ApiRcException
Class canonical name: com.linbit.linstor.core.apicallhandler.response.ApiRcException
Generated at: Method 'deleteVolumeDefinitionInTransaction', Source file 'CtrlVlmDfnDeleteApiCallHandler.java', Line #179
Error message: Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on node 'k8s-m2' is still in use.
Error context:
Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on node 'k8s-m2' is still in use.
Asynchronous stage backtrace:
Error has been observed at the following site(s):
*__checkpoint ? Delete volume definition
Original Stack Trace:
Call backtrace:
Method Native Class:Line number
deleteVolumeDefinitionInTransaction N com.linbit.linstor.core.apicallhandler.controller.CtrlVlmDfnDeleteApiCallHandler:179
Suppressed exception 1 of 1:
===============
Category: RuntimeException
Class name: OnAssemblyException
Class canonical name: reactor.core.publisher.FluxOnAssembly.OnAssemblyException
Generated at: Method 'deleteVolumeDefinitionInTransaction', Source file 'CtrlVlmDfnDeleteApiCallHandler.java', Line #179
Error message:
Error has been observed at the following site(s):
*__checkpoint ��� Delete volume definition
Original Stack Trace:
Error context:
Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on node 'k8s-m2' is still in use.
Call backtrace:
Method Native Class:Line number
deleteVolumeDefinitionInTransaction N com.linbit.linstor.core.apicallhandler.controller.CtrlVlmDfnDeleteApiCallHandler:179
lambda$deleteVolumeDefinition$0 N com.linbit.linstor.core.apicallhandler.controller.CtrlVlmDfnDeleteApiCallHandler:134
doInScope N com.linbit.linstor.core.apicallhandler.ScopeRunner:149
lambda$fluxInScope$0 N com.linbit.linstor.core.apicallhandler.ScopeRunner:76
call N reactor.core.publisher.MonoCallable:72
trySubscribeScalarMap N reactor.core.publisher.FluxFlatMap:127
subscribeOrReturn N reactor.core.publisher.MonoFlatMapMany:49
subscribe N reactor.core.publisher.Flux:8759
onNext N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:195
request N reactor.core.publisher.Operators$ScalarSubscription:2545
onSubscribe N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:141
subscribe N reactor.core.publisher.MonoJust:55
subscribe N reactor.core.publisher.MonoDeferContextual:55
subscribe N reactor.core.publisher.Flux:8773
onNext N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:195
request N reactor.core.publisher.Operators$ScalarSubscription:2545
onSubscribe N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:141
subscribe N reactor.core.publisher.MonoJust:55
subscribe N reactor.core.publisher.MonoDeferContextual:55
subscribe N reactor.core.publisher.Mono:4495
subscribeWith N reactor.core.publisher.Mono:4561
subscribe N reactor.core.publisher.Mono:4462
subscribe N reactor.core.publisher.Mono:4398
subscribe N reactor.core.publisher.Mono:4370
doFlux N com.linbit.linstor.api.rest.v1.RequestHelper:324
deleteVolumeDefinition N com.linbit.linstor.api.rest.v1.VolumeDefinitions:229
invoke0 Y jdk.internal.reflect.NativeMethodAccessorImpl:unknown
invoke N jdk.internal.reflect.NativeMethodAccessorImpl:62
invoke N jdk.internal.reflect.DelegatingMethodAccessorImpl:43
invoke N java.lang.reflect.Method:566
lambda$static$0 N org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory:52
run N org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1:146
invoke N org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher:189
doDispatch N org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker:159
dispatch N org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher:93
invoke N org.glassfish.jersey.server.model.ResourceMethodInvoker:478
apply N org.glassfish.jersey.server.model.ResourceMethodInvoker:400
apply N org.glassfish.jersey.server.model.ResourceMethodInvoker:81
run N org.glassfish.jersey.server.ServerRuntime$1:256
call N org.glassfish.jersey.internal.Errors$1:248
call N org.glassfish.jersey.internal.Errors$1:244
process N org.glassfish.jersey.internal.Errors:292
process N org.glassfish.jersey.internal.Errors:274
process N org.glassfish.jersey.internal.Errors:244
runInScope N org.glassfish.jersey.process.internal.RequestScope:265
process N org.glassfish.jersey.server.ServerRuntime:235
handle N org.glassfish.jersey.server.ApplicationHandler:684
service N org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer:356
run N org.glassfish.grizzly.http.server.HttpHandler$1:190
doWork N org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker:535
run N org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker:515
run N java.lang.Thread:829
END OF ERROR REPORT.
-
Generated resource file for resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' is invalid.
$ kubectl exec -ti -n piraeus-datastore deploy/linstor-controller -- linstor error-reports show 66101589-E5863-000000
ERROR REPORT 66101589-E5863-000000
============================================================
Application: LINBIT�� LINSTOR
Module: Satellite
Version: 1.25.1
Build ID: 918d21837aefab23c28a52e8fcb0af14033d9bcb
Build time: 2023-11-20T10:09:08+00:00
Error time: 2024-04-05 15:15:52
Node: k8s-m0
============================================================
Reported error:
===============
Description:
Operations on resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' were aborted
Cause:
Verification of resource file failed
Additional information:
The error reported by the runtime environment or operating system is:
The external command 'drbdadm' exited with error code 10
Category: LinStorException
Class name: StorageException
Class canonical name: com.linbit.linstor.storage.StorageException
Generated at: Method 'regenerateResFile', Source file 'DrbdLayer.java', Line #1624
Error message: Generated resource file for resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' is invalid.
Error context:
An error occurred while processing resource 'Node: 'k8s-m0', Rsc: 'pvc-80745669-9bf4-4776-9865-f6f419c57863''
ErrorContext: Description: Operations on resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' were aborted
Cause: Verification of resource file failed
Details: The error reported by the runtime environment or operating system is:
The external command 'drbdadm' exited with error code 10
Call backtrace:
Method Native Class:Line number
regenerateResFile N com.linbit.linstor.layer.drbd.DrbdLayer:1624
adjustDrbd N com.linbit.linstor.layer.drbd.DrbdLayer:687
process N com.linbit.linstor.layer.drbd.DrbdLayer:432
process N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:938
processResourcesAndSnapshots N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:383
dispatchResources N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:181
dispatchResources N com.linbit.linstor.core.devmgr.DeviceManagerImpl:328
phaseDispatchDeviceHandlers N com.linbit.linstor.core.devmgr.DeviceManagerImpl:1156
devMgrLoop N com.linbit.linstor.core.devmgr.DeviceManagerImpl:756
run N com.linbit.linstor.core.devmgr.DeviceManagerImpl:650
run N java.lang.Thread:829
Caused by:
==========
Description:
Execution of the external command 'drbdadm' failed.
Cause:
The external command exited with error code 10.
Correction:
- Check whether the external program is operating properly.
- Check whether the command line is correct.
Contact a system administrator or a developer if the command line is no longer valid
for the installed version of the external program.
Additional information:
The full command line executed was:
drbdadm --config-to-test /var/lib/linstor.d/pvc-80745669-9bf4-4776-9865-f6f419c57863.res_tmp --config-to-exclude /var/lib/linstor.d/pvc-80745669-9bf4-4776-9865-f6f419c57863.res sh-nop
The external command sent the following output data:
The external command sent the following error information:
/etc/drbd.conf:54: in resource pvc-80745669-9bf4-4776-9865-f6f419c57863, on k8s-m0 { ... }: volume 0 not defined on k8s-m2
command sh-nop exited with code 10
Category: LinStorException
Class name: ExtCmdFailedException
Class canonical name: com.linbit.extproc.ExtCmdFailedException
Generated at: Method 'execute', Source file 'DrbdAdm.java', Line #642
Error message: The external command 'drbdadm' exited with error code 10
ErrorContext: Description: Execution of the external command 'drbdadm' failed.
Cause: The external command exited with error code 10.
Correction: - Check whether the external program is operating properly.
- Check whether the command line is correct.
Contact a system administrator or a developer if the command line is no longer valid
for the installed version of the external program.
Details: The full command line executed was:
drbdadm --config-to-test /var/lib/linstor.d/pvc-80745669-9bf4-4776-9865-f6f419c57863.res_tmp --config-to-exclude /var/lib/linstor.d/pvc-80745669-9bf4-4776-9865-f6f419c57863.res sh-nop
The external command sent the following output data:
The external command sent the following error information:
/etc/drbd.conf:54: in resource pvc-80745669-9bf4-4776-9865-f6f419c57863, on k8s-m0 { ... }: volume 0 not defined on k8s-m2
command sh-nop exited with code 10
Call backtrace:
Method Native Class:Line number
execute N com.linbit.linstor.layer.drbd.utils.DrbdAdm:642
execute N com.linbit.linstor.layer.drbd.utils.DrbdAdm:625
checkResFile N com.linbit.linstor.layer.drbd.utils.DrbdAdm:492
regenerateResFile N com.linbit.linstor.layer.drbd.DrbdLayer:1617
adjustDrbd N com.linbit.linstor.layer.drbd.DrbdLayer:687
process N com.linbit.linstor.layer.drbd.DrbdLayer:432
process N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:938
processResourcesAndSnapshots N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:383
dispatchResources N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:181
dispatchResources N com.linbit.linstor.core.devmgr.DeviceManagerImpl:328
phaseDispatchDeviceHandlers N com.linbit.linstor.core.devmgr.DeviceManagerImpl:1156
devMgrLoop N com.linbit.linstor.core.devmgr.DeviceManagerImpl:756
run N com.linbit.linstor.core.devmgr.DeviceManagerImpl:650
run N java.lang.Thread:829
END OF ERROR REPORT.
-
(Node: 'k8s-m2') Generated resource file for resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' is invalid.
ERROR REPORT 66101520-00000-000004
============================================================
Application: LINBIT�� LINSTOR
Module: Controller
Version: 1.25.1
Build ID: 918d21837aefab23c28a52e8fcb0af14033d9bcb
Build time: 2023-11-20T10:09:08+00:00
Error time: 2024-04-05 15:16:09
Node: linstor-controller-5f594b5b45-9lr8z
Peer: RestClient(10.244.42.135; 'linstor-csi/v1.3.0-4077ebefbe439ee2894b782aa7914b590891d2ff')
============================================================
Reported error:
===============
Category: RuntimeException
Class name: ApiRcException
Class canonical name: com.linbit.linstor.core.apicallhandler.response.ApiRcException
Generated at: Method 'handleAnswer', Source file 'CommonMessageProcessor.java', Line #346
Error message: (Node: 'k8s-m2') Generated resource file for resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' is invalid.
Error context:
(Node: 'k8s-m2') Generated resource file for resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' is invalid.
Asynchronous stage backtrace:
Error has been observed at the following site(s):
*__checkpoint ? Modify resource-definition
Original Stack Trace:
Call backtrace:
Method Native Class:Line number
handleAnswer N com.linbit.linstor.proto.CommonMessageProcessor:346
Suppressed exception 1 of 1:
===============
Category: RuntimeException
Class name: OnAssemblyException
Class canonical name: reactor.core.publisher.FluxOnAssembly.OnAssemblyException
Generated at: Method 'handleAnswer', Source file 'CommonMessageProcessor.java', Line #346
Error message:
Error has been observed at the following site(s):
*__checkpoint ��� Modify resource-definition
Original Stack Trace:
Error context:
(Node: 'k8s-m2') Generated resource file for resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' is invalid.
Call backtrace:
Method Native Class:Line number
handleAnswer N com.linbit.linstor.proto.CommonMessageProcessor:346
handleDataMessage N com.linbit.linstor.proto.CommonMessageProcessor:293
doProcessInOrderMessage N com.linbit.linstor.proto.CommonMessageProcessor:244
lambda$doProcessMessage$4 N com.linbit.linstor.proto.CommonMessageProcessor:229
subscribe N reactor.core.publisher.FluxDefer:46
subscribe N reactor.core.publisher.Flux:8773
onNext N reactor.core.publisher.FluxFlatMap$FlatMapMain:427
drainAsync N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:453
drain N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:724
onNext N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:256
drainFused N reactor.core.publisher.SinkManyUnicast:319
drain N reactor.core.publisher.SinkManyUnicast:362
tryEmitNext N reactor.core.publisher.SinkManyUnicast:237
tryEmitNext N reactor.core.publisher.SinkManySerialized:100
processInOrder N com.linbit.linstor.netcom.TcpConnectorPeer:392
doProcessMessage N com.linbit.linstor.proto.CommonMessageProcessor:227
lambda$processMessage$2 N com.linbit.linstor.proto.CommonMessageProcessor:164
onNext N reactor.core.publisher.FluxPeek$PeekSubscriber:185
runAsync N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:440
run N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:527
call N reactor.core.scheduler.WorkerTask:84
call N reactor.core.scheduler.WorkerTask:37
run N java.util.concurrent.FutureTask:264
run N java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
runWorker N java.util.concurrent.ThreadPoolExecutor:1128
run N java.util.concurrent.ThreadPoolExecutor$Worker:628
run N java.lang.Thread:829
END OF ERROR REPORT.
-
Generated resource file for resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' is invalid.
ERROR REPORT 6610156F-8EC88-000004
============================================================
Application: LINBIT�� LINSTOR
Module: Satellite
Version: 1.25.1
Build ID: 918d21837aefab23c28a52e8fcb0af14033d9bcb
Build time: 2023-11-20T10:09:08+00:00
Error time: 2024-04-05 15:16:12
Node: k8s-m2
============================================================
Reported error:
===============
Description:
Operations on resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' were aborted
Cause:
Verification of resource file failed
Additional information:
The error reported by the runtime environment or operating system is:
The external command 'drbdadm' exited with error code 10
Category: LinStorException
Class name: StorageException
Class canonical name: com.linbit.linstor.storage.StorageException
Generated at: Method 'regenerateResFile', Source file 'DrbdLayer.java', Line #1624
Error message: Generated resource file for resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' is invalid.
Error context:
An error occurred while processing resource 'Node: 'k8s-m2', Rsc: 'pvc-80745669-9bf4-4776-9865-f6f419c57863''
ErrorContext: Description: Operations on resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' were aborted
Cause: Verification of resource file failed
Details: The error reported by the runtime environment or operating system is:
The external command 'drbdadm' exited with error code 10
Call backtrace:
Method Native Class:Line number
regenerateResFile N com.linbit.linstor.layer.drbd.DrbdLayer:1624
adjustDrbd N com.linbit.linstor.layer.drbd.DrbdLayer:687
process N com.linbit.linstor.layer.drbd.DrbdLayer:432
process N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:938
processResourcesAndSnapshots N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:383
dispatchResources N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:181
dispatchResources N com.linbit.linstor.core.devmgr.DeviceManagerImpl:328
phaseDispatchDeviceHandlers N com.linbit.linstor.core.devmgr.DeviceManagerImpl:1156
devMgrLoop N com.linbit.linstor.core.devmgr.DeviceManagerImpl:756
run N com.linbit.linstor.core.devmgr.DeviceManagerImpl:650
run N java.lang.Thread:829
Caused by:
==========
Description:
Execution of the external command 'drbdadm' failed.
Cause:
The external command exited with error code 10.
Correction:
- Check whether the external program is operating properly.
- Check whether the command line is correct.
Contact a system administrator or a developer if the command line is no longer valid
for the installed version of the external program.
Additional information:
The full command line executed was:
drbdadm --config-to-test /var/lib/linstor.d/pvc-80745669-9bf4-4776-9865-f6f419c57863.res_tmp --config-to-exclude /var/lib/linstor.d/pvc-80745669-9bf4-4776-9865-f6f419c57863.res sh-nop
The external command sent the following output data:
The external command sent the following error information:
/etc/drbd.conf:54: in resource pvc-80745669-9bf4-4776-9865-f6f419c57863, on k8s-m2 { ... }: volume 0 missing (present on k8s-m0)
command sh-nop exited with code 10
Category: LinStorException
Class name: ExtCmdFailedException
Class canonical name: com.linbit.extproc.ExtCmdFailedException
Generated at: Method 'execute', Source file 'DrbdAdm.java', Line #642
Error message: The external command 'drbdadm' exited with error code 10
ErrorContext: Description: Execution of the external command 'drbdadm' failed.
Cause: The external command exited with error code 10.
Correction: - Check whether the external program is operating properly.
- Check whether the command line is correct.
Contact a system administrator or a developer if the command line is no longer valid
for the installed version of the external program.
Details: The full command line executed was:
drbdadm --config-to-test /var/lib/linstor.d/pvc-80745669-9bf4-4776-9865-f6f419c57863.res_tmp --config-to-exclude /var/lib/linstor.d/pvc-80745669-9bf4-4776-9865-f6f419c57863.res sh-nop
The external command sent the following output data:
The external command sent the following error information:
/etc/drbd.conf:54: in resource pvc-80745669-9bf4-4776-9865-f6f419c57863, on k8s-m2 { ... }: volume 0 missing (present on k8s-m0)
command sh-nop exited with code 10
Call backtrace:
Method Native Class:Line number
execute N com.linbit.linstor.layer.drbd.utils.DrbdAdm:642
execute N com.linbit.linstor.layer.drbd.utils.DrbdAdm:625
checkResFile N com.linbit.linstor.layer.drbd.utils.DrbdAdm:492
regenerateResFile N com.linbit.linstor.layer.drbd.DrbdLayer:1617
adjustDrbd N com.linbit.linstor.layer.drbd.DrbdLayer:687
process N com.linbit.linstor.layer.drbd.DrbdLayer:432
process N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:938
processResourcesAndSnapshots N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:383
dispatchResources N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:181
dispatchResources N com.linbit.linstor.core.devmgr.DeviceManagerImpl:328
phaseDispatchDeviceHandlers N com.linbit.linstor.core.devmgr.DeviceManagerImpl:1156
devMgrLoop N com.linbit.linstor.core.devmgr.DeviceManagerImpl:756
run N com.linbit.linstor.core.devmgr.DeviceManagerImpl:650
run N java.lang.Thread:829
END OF ERROR REPORT.
Output of LVM's pvs; vgs; lvs;
on cluster nodes:
k8s-m0:
PV VG Fmt Attr PSize PFree
/dev/sda2 vg00 lvm2 a-- <99,50g <49,50g
/dev/sdb vg01 lvm2 a-- <50,00g 516,00m
VG #PV #LV #SN Attr VSize VFree
vg00 1 1 0 wz--n- <99,50g <49,50g
vg01 1 2 0 wz--n- <50,00g 516,00m
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
root vg00 -wi-ao---- 50,00g
linstor vg01 twi-aotz-- 49,39g 0,01 10,44
pvc-80745669-9bf4-4776-9865-f6f419c57863_00000 vg01 Vwi-a-tz-- 10,00g linstor 0,01
k8s-m1:
PV VG Fmt Attr PSize PFree
/dev/sda2 vg00 lvm2 a-- <99,50g <49,50g
/dev/sdb vg01 lvm2 a-- <50,00g 516,00m
VG #PV #LV #SN Attr VSize VFree
vg00 1 1 0 wz--n- <99,50g <49,50g
vg01 1 2 0 wz--n- <50,00g 516,00m
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
root vg00 -wi-ao---- 50,00g
linstor vg01 twi-aotz-- 49,39g 0,43 10,58
pvc-b1d25fdb-8729-474b-ab0e-c031cf159d60_00000 vg01 Vwi-aotz-- 8,00g linstor 2,68
k8s-m2:
PV VG Fmt Attr PSize PFree
/dev/sda2 vg00 lvm2 a-- <99,50g <49,50g
/dev/sdb vg01 lvm2 a-- <50,00g 516,00m
VG #PV #LV #SN Attr VSize VFree
vg00 1 1 0 wz--n- <99,50g <49,50g
vg01 1 3 0 wz--n- <50,00g 516,00m
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
root vg00 -wi-ao---- 50,00g
linstor vg01 twi-aotz-- 49,39g 0,83 10,70
pvc-a6a8ed01-2406-4614-8432-fdef2b2c7abe_00000 vg01 Vwi-aotz-- 5,00g linstor 2,91
pvc-b1d25fdb-8729-474b-ab0e-c031cf159d60_00000 vg01 Vwi-aotz-- 8,00g linstor 3,28
Please try to update to the latest version.
It also looks like this was not a fresh install? Otherwise, why would there be any resources?
This
Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on node 'k8s-m2' is still in use.
Looks like the resource (which already existed) is still in use somewhere. So someone has the still mounted or similar. Clean that up first (check the resource state linstor r l
to find where it is "InUse" and see unmount it there).
I will try to upgrade to the latest version, but this is a fresh install. We plan to use Linstor in production, but before that we are doing automated testing by installing fresh Kubernetes on three VMs and then via Flux CD piraeus operator. This installation was started on Friday evening and this morning I saw the installation status and found the errors I describe in this issue.
The output of the linstor r l
:
$ kubectl exec -ti -n piraeus-datastore deploy/linstor-controller -- linstor r l
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ Conns ┊ State ┊ CreatedOn ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-80745669-9bf4-4776-9865-f6f419c57863 ┊ k8s-m0 ┊ 7002 ┊ ┊ ┊ Unknown ┊ ┊
┊ pvc-80745669-9bf4-4776-9865-f6f419c57863 ┊ k8s-m2 ┊ 7002 ┊ InUse ┊ ┊ Unknown ┊ 2024-04-05 15:15:27 ┊
┊ pvc-a6a8ed01-2406-4614-8432-fdef2b2c7abe ┊ k8s-m2 ┊ 7000 ┊ InUse ┊ Ok ┊ UpToDate ┊ 2024-04-05 15:15:24 ┊
┊ pvc-b1d25fdb-8729-474b-ab0e-c031cf159d60 ┊ k8s-m0 ┊ 7001 ┊ Unused ┊ Ok ┊ TieBreaker ┊ 2024-04-05 15:16:03 ┊
┊ pvc-b1d25fdb-8729-474b-ab0e-c031cf159d60 ┊ k8s-m1 ┊ 7001 ┊ InUse ┊ Ok ┊ UpToDate ┊ 2024-04-05 15:16:04 ┊
┊ pvc-b1d25fdb-8729-474b-ab0e-c031cf159d60 ┊ k8s-m2 ┊ 7001 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-04-05 15:16:02 ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
The PVC pvc-80745669-9bf4-4776-9865-f6f419c57863 is used by the monitoring, which cannot start:
$ kubectl get pvc -A | grep pvc-80745669-9bf4-4776-9865-f6f419c57863
monitoring kube-prometheus-stack-grafana Bound pvc-80745669-9bf4-4776-9865-f6f419c57863 10Gi RWO linstor-fast 2d17h
$ kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-kube-prometheus-stack-alertmanager-0 2/2 Running 0 35h
kube-prometheus-stack-grafana-9b8785fdd-m9nkm 0/3 Init:0/1 0 2d17h
kube-prometheus-stack-kube-state-metrics-776c898f6-qbjj9 1/1 Running 0 47h
kube-prometheus-stack-operator-696cbbfbfb-sql6s 1/1 Running 0 35h
kube-prometheus-stack-prometheus-node-exporter-d96g9 1/1 Running 0 2d17h
kube-prometheus-stack-prometheus-node-exporter-dcdh7 1/1 Running 0 2d17h
kube-prometheus-stack-prometheus-node-exporter-gfblh 1/1 Running 0 2d17h
prometheus-kube-prometheus-stack-prometheus-0 2/2 Running 0 35h
So it looks like 6610156F-8EC88-000000
indicates that mkfs failed because DRBD was not set up correctly. But in 66101520-00000-000000
we can see that the resource is apparently in use. This does not make much sense. This would indicate that something is using keeping the resource in primary without any actual disk.
Could you please try to run:
kubectl exec k8s-m2 -- drbdsetup status pvc-80745669-9bf4-4776-9865-f6f419c57863
kubectl exec k8s-m2 -- drbdsetup show pvc-80745669-9bf4-4776-9865-f6f419c57863
It looks like the CSI driver later tried to create the volume again and somehow determined that the volume already exists, which lead to it being bound. I would recommend deleting the PVC and PV and letting it be recreated.
Here is output of the commands
$ kubectl exec -n piraeus-datastore k8s-m2 -- drbdsetup status pvc-80745669-9bf4-4776-9865-f6f419c57863
pvc-80745669-9bf4-4776-9865-f6f419c57863 role:Primary
$ kubectl exec -n piraeus-datastore k8s-m2 -- drbdsetup show pvc-80745669-9bf4-4776-9865-f6f419c57863
resource "pvc-80745669-9bf4-4776-9865-f6f419c57863" {
options {
on-no-data-accessible suspend-io;
on-suspended-primary-outdated force-secondary;
}
_this_host {
node-id 0;
}
}
Ok, this looks like a bug in LINSTOR that does not properly restore the resource to secondary after the mkfs call fails. Still leaves the issue how it can be that /dev/drbd1002 does not exist at this point. I have no idea how that can happen.
To fully clean up the volume:
kubectl exec -n piraeus-datastore k8s-m2 -- drbdsetup secondary pvc-80745669-9bf4-4776-9865-f6f419c57863
Then, run linstor rd d pvc-80745669-9bf4-4776-9865-f6f419c57863
and delete PVC and PV.
Your last suggestion worked, I was able to reinstall the monitoring. What would you recommend now? Update to the latest version of piraeus Operator and create a new issue when I get a new error? What steps would help you to analyze this error?
Yes, please upgrade and see if it happens again. In case you encounter an issue, run
kubectl exec -it deploy/linstor-controller -- linstor sos-report create
Then copy the created file from the pod to your host and attach it to the issue
@WanzenBug , I am currently testing the latest version of Piraeus Operator v2.5.0 and so far the problem described in this issue has not reoccurred. However, I have just reproduced again a problem that I described in another issue: https://github.com/LINBIT/linstor-server/issues/396 . Since I never got a response in the linstor-server project, should I recreate the issue in this (piraeus-operator) project?
Yes, this is an issue more appropriate for the piraeus project.