linstor-server icon indicating copy to clipboard operation
linstor-server copied to clipboard

ApiRcException while creating failed replica and trying to delete it

Open kvaps opened this issue 3 years ago • 0 comments

Sorry for picures in this issue, I have only screen recording. Another interesting case, possible connected with https://github.com/LINBIT/linstor-server/issues/333

Before the experiment:

All nodes are online:

image

Storage pools are ok: image

All resources are up to date:

image

I have a resource:

# linstor r l -r pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName                             ┊ Node      ┊ Port ┊ Usage  ┊ Conns                 ┊    State ┊ CreatedOn           ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5 ┊ gpnvkc-s2 ┊ 7000 ┊ InUse  ┊ Ok                    ┊ Diskless ┊ 2022-12-21 06:22:45 ┊
┊ pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5 ┊ gpnvkc-w1 ┊ 7000 ┊ Unused ┊ Ok                    ┊ UpToDate ┊ 2022-12-05 11:18:08 ┊
┊ pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5 ┊ gpnvkc-w2 ┊ 7000 ┊ Unused ┊ Ok                    ┊ UpToDate ┊ 2022-12-05 11:22:11 ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Then I create a new replica, execution is stuck for a few minutes:

# linstor r c gpnvkc-w2 pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5 -s thindata
^C

When I run linstor node list many times, I can see that they are continuously blinking between Online, Connected and OFFLINE states.

image image image image image

If I restart the linstor-controller it seems start working and all the nodes become to Online

But newly create resource stay on Unknown state:

# linstor r l -r pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName                             ┊ Node      ┊ Port ┊ Usage  ┊ Conns                 ┊    State ┊ CreatedOn           ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5 ┊ gpnvkc-s2 ┊ 7000 ┊ InUse  ┊ Connecting(gpnvkc-w3) ┊ Diskless ┊ 2022-12-21 06:22:45 ┊
┊ pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5 ┊ gpnvkc-w1 ┊ 7000 ┊ Unused ┊ Connecting(gpnvkc-w3) ┊ UpToDate ┊ 2022-12-05 11:18:08 ┊
┊ pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5 ┊ gpnvkc-w2 ┊ 7000 ┊ Unused ┊ Connecting(gpnvkc-w3) ┊ UpToDate ┊ 2022-12-05 11:22:11 ┊
┊ pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5 ┊ gpnvkc-w3 ┊ 7000 ┊        ┊                       ┊  Unknown ┊                     ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Logs:

image

Error report:

image

When I try to remove such replica, I have an error:

# linstor r d gpnvkc-w3 pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5
SUCCESS:
Description:
    Node: gpnvkc-w3, Resource: pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5 preparing for deletion.
Details:
    Node: gpnvkc-w3, Resource: pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5 UUID is: 7084c584-24c4-4e5b-8e8d-c139f3a73414
SUCCESS:
    Preparing deletion of resource on 'gpnvkc-s2'
ERROR:
Description:
    (Node: 'gpnvkc-w3') No response generated by handler.
Details:
    In API call 'ChangedRsc'.
SUCCESS:
    Preparing deletion of resource on 'gpnvkc-w1'
SUCCESS:
    Preparing deletion of resource on 'gpnvkc-w2'
ERROR:
Description:
    Deletion of resource 'pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5' on node 'gpnvkc-w3' failed due to an unknown exception.
Details:
    Node: gpnvkc-w3, Resource: pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5
Show reports:
    linstor error-reports show 63A43D9B-00000-000000

Error report:

ERROR REPORT 63A43D9B-00000-000000

============================================================

Application:                        LINBIT�� LINSTOR
Module:                             Controller
Version:                            1.20.0
Build ID:                           9c6f7fad48521899f7a99c564b1d33aeacfdbfa8
Build time:                         2022-11-07T16:37:38+00:00
Error time:                         2022-12-22 11:28:05
Node:                               linstor-controller-6787cccfbf-l2tlz
Peer:                               RestClient(10.111.7.14; 'PythonLinstor/1.15.1 (API1.0.4): Client 1.15.1')

============================================================

Reported error:
===============

Category:                           RuntimeException
Class name:                         DelayedApiRcException
Class canonical name:               com.linbit.linstor.core.apicallhandler.response.CtrlResponseUtils.DelayedApiRcException
Generated at:                       Method 'lambda$mergeExtractingApiRcExceptions$4', Source file 'CtrlResponseUtils.java', Line #126

Error message:                      Exceptions have been converted to responses

Error context:
    Deletion of resource 'pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5' on node 'gpnvkc-w3' failed due to an unknown exception.

Asynchronous stage backtrace:
    (Node: 'gpnvkc-w3') No response generated by handler.

    Error has been observed at the following site(s):
    	|_ checkpoint ? Prepare resource delete
    	|_ checkpoint ? Activating resource if necessary before deletion
    Stack trace:

Call backtrace:

    Method                                   Native Class:Line number
    lambda$mergeExtractingApiRcExceptions$4  N      com.linbit.linstor.core.apicallhandler.response.CtrlResponseUtils:126

Suppressed exception 1 of 2:
===============
Category:                           RuntimeException
Class name:                         ApiRcException
Class canonical name:               com.linbit.linstor.core.apicallhandler.response.ApiRcException
Generated at:                       Method 'handleAnswer', Source file 'CommonMessageProcessor.java', Line #337

Error message:                      (Node: 'gpnvkc-w3') No response generated by handler.

Error context:
    Deletion of resource 'pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5' on node 'gpnvkc-w3' failed due to an unknown exception.

ApiRcException entries:
Nr: 1
  Message: (Node: 'gpnvkc-w3') No response generated by handler.
  Details: In API call 'ChangedRsc'.

Call backtrace:

    Method                                   Native Class:Line number
    handleAnswer                             N      com.linbit.linstor.proto.CommonMessageProcessor:337
    handleDataMessage                        N      com.linbit.linstor.proto.CommonMessageProcessor:284
    doProcessInOrderMessage                  N      com.linbit.linstor.proto.CommonMessageProcessor:235
    lambda$doProcessMessage$3                N      com.linbit.linstor.proto.CommonMessageProcessor:220
    subscribe                                N      reactor.core.publisher.FluxDefer:46
    subscribe                                N      reactor.core.publisher.Flux:8357
    onNext                                   N      reactor.core.publisher.FluxFlatMap$FlatMapMain:418
    drainAsync                               N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
    drain                                    N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
    onNext                                   N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
    drainFused                               N      reactor.core.publisher.UnicastProcessor:286
    drain                                    N      reactor.core.publisher.UnicastProcessor:329
    onNext                                   N      reactor.core.publisher.UnicastProcessor:408
    next                                     N      reactor.core.publisher.FluxCreate$IgnoreSink:618
    next                                     N      reactor.core.publisher.FluxCreate$SerializedSink:153
    processInOrder                           N      com.linbit.linstor.netcom.TcpConnectorPeer:383
    doProcessMessage                         N      com.linbit.linstor.proto.CommonMessageProcessor:218
    lambda$processMessage$2                  N      com.linbit.linstor.proto.CommonMessageProcessor:164
    onNext                                   N      reactor.core.publisher.FluxPeek$PeekSubscriber:177
    runAsync                                 N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
    run                                      N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
    call                                     N      reactor.core.scheduler.WorkerTask:84
    call                                     N      reactor.core.scheduler.WorkerTask:37
    run                                      N      java.util.concurrent.FutureTask:264
    run                                      N      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
    runWorker                                N      java.util.concurrent.ThreadPoolExecutor:1128
    run                                      N      java.util.concurrent.ThreadPoolExecutor$Worker:628
    run                                      N      java.lang.Thread:829

Suppressed exception 2 of 2:
===============
Category:                           RuntimeException
Class name:                         OnAssemblyException
Class canonical name:               reactor.core.publisher.FluxOnAssembly.OnAssemblyException
Generated at:                       Method 'lambda$mergeExtractingApiRcExceptions$4', Source file 'CtrlResponseUtils.java', Line #126

Error message:
Error has been observed at the following site(s):
	|_ checkpoint ��� Prepare resource delete
	|_ checkpoint ��� Activating resource if necessary before deletion
Stack trace:

Error context:
    Deletion of resource 'pvc-d6a4eeca-52a8-49a5-8693-9d33bd1d29b5' on node 'gpnvkc-w3' failed due to an unknown exception.

Call backtrace:

    Method                                   Native Class:Line number
    lambda$mergeExtractingApiRcExceptions$4  N      com.linbit.linstor.core.apicallhandler.response.CtrlResponseUtils:126
    subscribe                                N      reactor.core.publisher.FluxDefer:46
    subscribe                                N      reactor.core.publisher.Flux:8357
    onComplete                               N      reactor.core.publisher.FluxConcatArray$ConcatArraySubscriber:207
    onComplete                               N      reactor.core.publisher.FluxMap$MapSubscriber:136
    checkTerminated                          N      reactor.core.publisher.FluxFlatMap$FlatMapMain:838
    drainLoop                                N      reactor.core.publisher.FluxFlatMap$FlatMapMain:600
    innerComplete                            N      reactor.core.publisher.FluxFlatMap$FlatMapMain:909
    onComplete                               N      reactor.core.publisher.FluxFlatMap$FlatMapInner:1013
    onComplete                               N      reactor.core.publisher.Operators$MultiSubscriptionSubscriber:2016
    onComplete                               N      reactor.core.publisher.FluxMap$MapSubscriber:136
    onComplete                               N      reactor.core.publisher.FluxConcatArray$ConcatArraySubscriber:191
    onComplete                               N      reactor.core.publisher.MonoIgnoreElements$IgnoreElementsSubscriber:81
    onComplete                               N      reactor.core.publisher.FluxPeek$PeekSubscriber:252
    onComplete                               N      reactor.core.publisher.Operators$MultiSubscriptionSubscriber:2016
    onComplete                               N      reactor.core.publisher.FluxMap$MapSubscriber:136
    onComplete                               N      reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber:78
    complete                                 N      reactor.core.publisher.FluxCreate$BaseSink:438
    drain                                    N      reactor.core.publisher.FluxCreate$BufferAsyncSink:784
    complete                                 N      reactor.core.publisher.FluxCreate$BufferAsyncSink:732
    drainLoop                                N      reactor.core.publisher.FluxCreate$SerializedSink:239
    drain                                    N      reactor.core.publisher.FluxCreate$SerializedSink:205
    complete                                 N      reactor.core.publisher.FluxCreate$SerializedSink:196
    apiCallComplete                          N      com.linbit.linstor.netcom.TcpConnectorPeer:465
    handleComplete                           N      com.linbit.linstor.proto.CommonMessageProcessor:363
    handleDataMessage                        N      com.linbit.linstor.proto.CommonMessageProcessor:287
    doProcessInOrderMessage                  N      com.linbit.linstor.proto.CommonMessageProcessor:235
    lambda$doProcessMessage$3                N      com.linbit.linstor.proto.CommonMessageProcessor:220
    subscribe                                N      reactor.core.publisher.FluxDefer:46
    subscribe                                N      reactor.core.publisher.Flux:8357
    onNext                                   N      reactor.core.publisher.FluxFlatMap$FlatMapMain:418
    drainAsync                               N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:414
    drain                                    N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:679
    onNext                                   N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:243
    drainFused                               N      reactor.core.publisher.UnicastProcessor:286
    drain                                    N      reactor.core.publisher.UnicastProcessor:329
    onNext                                   N      reactor.core.publisher.UnicastProcessor:408
    next                                     N      reactor.core.publisher.FluxCreate$IgnoreSink:618
    next                                     N      reactor.core.publisher.FluxCreate$SerializedSink:153
    processInOrder                           N      com.linbit.linstor.netcom.TcpConnectorPeer:383
    doProcessMessage                         N      com.linbit.linstor.proto.CommonMessageProcessor:218
    lambda$processMessage$2                  N      com.linbit.linstor.proto.CommonMessageProcessor:164
    onNext                                   N      reactor.core.publisher.FluxPeek$PeekSubscriber:177
    runAsync                                 N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:439
    run                                      N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:526
    call                                     N      reactor.core.scheduler.WorkerTask:84
    call                                     N      reactor.core.scheduler.WorkerTask:37
    run                                      N      java.util.concurrent.FutureTask:264
    run                                      N      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
    runWorker                                N      java.util.concurrent.ThreadPoolExecutor:1128
    run                                      N      java.util.concurrent.ThreadPoolExecutor$Worker:628
    run                                      N      java.lang.Thread:829


END OF ERROR REPORT.

kvaps avatar Dec 22 '22 16:12 kvaps