linstor-server
linstor-server copied to clipboard
ApiRcException while creating failed replica and trying to delete it
Sorry for picures in this issue, I have only screen recording. Another interesting case, possible connected with https://github.com/LINBIT/linstor-server/issues/333
Before the experiment:
All nodes are online:

Storage pools are ok:

All resources are up to date:

I have a resource:
# linstor r l -r pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ Conns ┊ State ┊ CreatedOn ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5 ┊ gpnvkc-s2 ┊ 7000 ┊ InUse ┊ Ok ┊ Diskless ┊ 2022-12-21 06:22:45 ┊
┊ pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5 ┊ gpnvkc-w1 ┊ 7000 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2022-12-05 11:18:08 ┊
┊ pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5 ┊ gpnvkc-w2 ┊ 7000 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2022-12-05 11:22:11 ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Then I create a new replica, execution is stuck for a few minutes:
# linstor r c gpnvkc-w2 pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5 -s thindata
^C
When I run linstor node list many times, I can see that they are continuously blinking between Online, Connected and OFFLINE states.

If I restart the linstor-controller it seems start working and all the nodes become to Online
But newly create resource stay on Unknown state:
# linstor r l -r pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ Conns ┊ State ┊ CreatedOn ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5 ┊ gpnvkc-s2 ┊ 7000 ┊ InUse ┊ Connecting(gpnvkc-w3) ┊ Diskless ┊ 2022-12-21 06:22:45 ┊
┊ pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5 ┊ gpnvkc-w1 ┊ 7000 ┊ Unused ┊ Connecting(gpnvkc-w3) ┊ UpToDate ┊ 2022-12-05 11:18:08 ┊
┊ pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5 ┊ gpnvkc-w2 ┊ 7000 ┊ Unused ┊ Connecting(gpnvkc-w3) ┊ UpToDate ┊ 2022-12-05 11:22:11 ┊
┊ pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5 ┊ gpnvkc-w3 ┊ 7000 ┊ ┊ ┊ Unknown ┊ ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Logs:

Error report:

When I try to remove such replica, I have an error:
# linstor r d gpnvkc-w3 pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5
SUCCESS:
Description:
Node: gpnvkc-w3, Resource: pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5 preparing for deletion.
Details:
Node: gpnvkc-w3, Resource: pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5 UUID is: 7084c584-24c4-4e5b-8e8d-c139f3a73414
SUCCESS:
Preparing deletion of resource on 'gpnvkc-s2'
ERROR:
Description:
(Node: 'gpnvkc-w3') No response generated by handler.
Details:
In API call 'ChangedRsc'.
SUCCESS:
Preparing deletion of resource on 'gpnvkc-w1'
SUCCESS:
Preparing deletion of resource on 'gpnvkc-w2'
ERROR:
Description:
Deletion of resource 'pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5' on node 'gpnvkc-w3' failed due to an unknown exception.
Details:
Node: gpnvkc-w3, Resource: pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5
Show reports:
linstor error-reports show 63A43D9B-00000-000000
Error report:
ERROR REPORT 63A43D9B-00000-000000
============================================================
Application: LINBIT�� LINSTOR
Module: Controller
Version: 1.20.0
Build ID: 9c6f7fad48521899f7a99c564b1d33aeacfdbfa8
Build time: 2022-11-07T16:37:38+00:00
Error time: 2022-12-22 11:28:05
Node: linstor-controller-6787cccfbf-l2tlz
Peer: RestClient(10.111.7.14; 'PythonLinstor/1.15.1 (API1.0.4): Client 1.15.1')
============================================================
Reported error:
===============
Category: RuntimeException
Class name: DelayedApiRcException
Class canonical name: com.linbit.linstor.core.apicallhandler.response.CtrlResponseUtils.DelayedApiRcException
Generated at: Method 'lambda$mergeExtractingApiRcExceptions$4', Source file 'CtrlResponseUtils.java', Line #126
Error message: Exceptions have been converted to responses
Error context:
Deletion of resource 'pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5' on node 'gpnvkc-w3' failed due to an unknown exception.
Asynchronous stage backtrace:
(Node: 'gpnvkc-w3') No response generated by handler.
Error has been observed at the following site(s):
|_ checkpoint ? Prepare resource delete
|_ checkpoint ? Activating resource if necessary before deletion
Stack trace:
Call backtrace:
Method Native Class:Line number
lambda$mergeExtractingApiRcExceptions$4 N com.linbit.linstor.core.apicallhandler.response.CtrlResponseUtils:126
Suppressed exception 1 of 2:
===============
Category: RuntimeException
Class name: ApiRcException
Class canonical name: com.linbit.linstor.core.apicallhandler.response.ApiRcException
Generated at: Method 'handleAnswer', Source file 'CommonMessageProcessor.java', Line #337
Error message: (Node: 'gpnvkc-w3') No response generated by handler.
Error context:
Deletion of resource 'pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5' on node 'gpnvkc-w3' failed due to an unknown exception.
ApiRcException entries:
Nr: 1
Message: (Node: 'gpnvkc-w3') No response generated by handler.
Details: In API call 'ChangedRsc'.
Call backtrace:
Method Native Class:Line number
handleAnswer N com.linbit.linstor.proto.CommonMessageProcessor:337
handleDataMessage N com.linbit.linstor.proto.CommonMessageProcessor:284
doProcessInOrderMessage N com.linbit.linstor.proto.CommonMessageProcessor:235
lambda$doProcessMessage$3 N com.linbit.linstor.proto.CommonMessageProcessor:220
subscribe N reactor.core.publisher.FluxDefer:46
subscribe N reactor.core.publisher.Flux:8357
onNext N reactor.core.publisher.FluxFlatMap$FlatMapMain:418
drainAsync N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
drain N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
onNext N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
drainFused N reactor.core.publisher.UnicastProcessor:286
drain N reactor.core.publisher.UnicastProcessor:329
onNext N reactor.core.publisher.UnicastProcessor:408
next N reactor.core.publisher.FluxCreate$IgnoreSink:618
next N reactor.core.publisher.FluxCreate$SerializedSink:153
processInOrder N com.linbit.linstor.netcom.TcpConnectorPeer:383
doProcessMessage N com.linbit.linstor.proto.CommonMessageProcessor:218
lambda$processMessage$2 N com.linbit.linstor.proto.CommonMessageProcessor:164
onNext N reactor.core.publisher.FluxPeek$PeekSubscriber:177
runAsync N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
run N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
call N reactor.core.scheduler.WorkerTask:84
call N reactor.core.scheduler.WorkerTask:37
run N java.util.concurrent.FutureTask:264
run N java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
runWorker N java.util.concurrent.ThreadPoolExecutor:1128
run N java.util.concurrent.ThreadPoolExecutor$Worker:628
run N java.lang.Thread:829
Suppressed exception 2 of 2:
===============
Category: RuntimeException
Class name: OnAssemblyException
Class canonical name: reactor.core.publisher.FluxOnAssembly.OnAssemblyException
Generated at: Method 'lambda$mergeExtractingApiRcExceptions$4', Source file 'CtrlResponseUtils.java', Line #126
Error message:
Error has been observed at the following site(s):
|_ checkpoint ��� Prepare resource delete
|_ checkpoint ��� Activating resource if necessary before deletion
Stack trace:
Error context:
Deletion of resource 'pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5' on node 'gpnvkc-w3' failed due to an unknown exception.
Call backtrace:
Method Native Class:Line number
lambda$mergeExtractingApiRcExceptions$4 N com.linbit.linstor.core.apicallhandler.response.CtrlResponseUtils:126
subscribe N reactor.core.publisher.FluxDefer:46
subscribe N reactor.core.publisher.Flux:8357
onComplete N reactor.core.publisher.FluxConcatArray$ConcatArraySubscriber:207
onComplete N reactor.core.publisher.FluxMap$MapSubscriber:136
checkTerminated N reactor.core.publisher.FluxFlatMap$FlatMapMain:838
drainLoop N reactor.core.publisher.FluxFlatMap$FlatMapMain:600
innerComplete N reactor.core.publisher.FluxFlatMap$FlatMapMain:909
onComplete N reactor.core.publisher.FluxFlatMap$FlatMapInner:1013
onComplete N reactor.core.publisher.Operators$MultiSubscriptionSubscriber:2016
onComplete N reactor.core.publisher.FluxMap$MapSubscriber:136
onComplete N reactor.core.publisher.FluxConcatArray$ConcatArraySubscriber:191
onComplete N reactor.core.publisher.MonoIgnoreElements$IgnoreElementsSubscriber:81
onComplete N reactor.core.publisher.FluxPeek$PeekSubscriber:252
onComplete N reactor.core.publisher.Operators$MultiSubscriptionSubscriber:2016
onComplete N reactor.core.publisher.FluxMap$MapSubscriber:136
onComplete N reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber:78
complete N reactor.core.publisher.FluxCreate$BaseSink:438
drain N reactor.core.publisher.FluxCreate$BufferAsyncSink:784
complete N reactor.core.publisher.FluxCreate$BufferAsyncSink:732
drainLoop N reactor.core.publisher.FluxCreate$SerializedSink:239
drain N reactor.core.publisher.FluxCreate$SerializedSink:205
complete N reactor.core.publisher.FluxCreate$SerializedSink:196
apiCallComplete N com.linbit.linstor.netcom.TcpConnectorPeer:465
handleComplete N com.linbit.linstor.proto.CommonMessageProcessor:363
handleDataMessage N com.linbit.linstor.proto.CommonMessageProcessor:287
doProcessInOrderMessage N com.linbit.linstor.proto.CommonMessageProcessor:235
lambda$doProcessMessage$3 N com.linbit.linstor.proto.CommonMessageProcessor:220
subscribe N reactor.core.publisher.FluxDefer:46
subscribe N reactor.core.publisher.Flux:8357
onNext N reactor.core.publisher.FluxFlatMap$FlatMapMain:418
drainAsync N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
drain N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
onNext N reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
drainFused N reactor.core.publisher.UnicastProcessor:286
drain N reactor.core.publisher.UnicastProcessor:329
onNext N reactor.core.publisher.UnicastProcessor:408
next N reactor.core.publisher.FluxCreate$IgnoreSink:618
next N reactor.core.publisher.FluxCreate$SerializedSink:153
processInOrder N com.linbit.linstor.netcom.TcpConnectorPeer:383
doProcessMessage N com.linbit.linstor.proto.CommonMessageProcessor:218
lambda$processMessage$2 N com.linbit.linstor.proto.CommonMessageProcessor:164
onNext N reactor.core.publisher.FluxPeek$PeekSubscriber:177
runAsync N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
run N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
call N reactor.core.scheduler.WorkerTask:84
call N reactor.core.scheduler.WorkerTask:37
run N java.util.concurrent.FutureTask:264
run N java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
runWorker N java.util.concurrent.ThreadPoolExecutor:1128
run N java.util.concurrent.ThreadPoolExecutor$Worker:628
run N java.lang.Thread:829
END OF ERROR REPORT.