linstor-server
linstor-server copied to clipboard
Target decrypted buffer is too small!
# linstor c v
linstor controller 1.11.0; GIT-hash: 3367e32d0fa92515efe61f6963767700a8701d98
# linstor err list
ERROR:
While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either that the input has been truncated or that an embedded message misreported its own length.
Show reports:
linstor error-reports show 60140BD5-00000-000619
# linstor error-reports show 60140BD5-00000-000619
ERROR REPORT 60140BD5-00000-000619
============================================================
Application: LINBIT�� LINSTOR
Module: Controller
Version: 1.11.0
Build ID: 3367e32d0fa92515efe61f6963767700a8701d98
Build time: 2020-12-22T17:52:40+00:00
Error time: 2021-02-02 14:36:48
Node: linstor-controller-5488d7d845-v8557
============================================================
Reported error:
===============
Category: Exception
Class name: InvalidProtocolBufferException
Class canonical name: com.google.protobuf.InvalidProtocolBufferException
Generated at: Method 'truncatedMessage', Source file 'InvalidProtocolBufferException.java', Line #84
Error message: While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either that the input has been truncated or that an embedded message misreported its own length.
Asynchronous stage backtrace:
Error has been observed at the following site(s):
|_ checkpoint ? Assemble error report list
Stack trace:
Call backtrace:
Method Native Class:Line number
truncatedMessage N com.google.protobuf.InvalidProtocolBufferException:84
Suppressed exception 1 of 1:
===============
Category: RuntimeException
Class name: OnAssemblyException
Class canonical name: reactor.core.publisher.FluxOnAssembly.OnAssemblyException
Generated at: Method 'truncatedMessage', Source file 'InvalidProtocolBufferException.java', Line #84
Error message:
Error has been observed at the following site(s):
|_ checkpoint ��� Assemble error report list
Stack trace:
Call backtrace:
Method Native Class:Line number
truncatedMessage N com.google.protobuf.InvalidProtocolBufferException:84
readRawBytesSlowPathOneChunk N com.google.protobuf.CodedInputStream$StreamDecoder:2923
readBytesSlowPath N com.google.protobuf.CodedInputStream$StreamDecoder:2974
readBytes N com.google.protobuf.CodedInputStream$StreamDecoder:2386
<init> N com.linbit.linstor.proto.responses.MsgErrorReportOuterClass$MsgErrorReport:346
<init> N com.linbit.linstor.proto.responses.MsgErrorReportOuterClass$MsgErrorReport:247
parsePartialFrom N com.linbit.linstor.proto.responses.MsgErrorReportOuterClass$MsgErrorReport$1:2516
parsePartialFrom N com.linbit.linstor.proto.responses.MsgErrorReportOuterClass$MsgErrorReport$1:2510
parsePartialFrom N com.google.protobuf.AbstractParser:215
parsePartialDelimitedFrom N com.google.protobuf.AbstractParser:255
parseDelimitedFrom N com.google.protobuf.AbstractParser:267
parseDelimitedFrom N com.google.protobuf.AbstractParser:272
parseDelimitedFrom N com.google.protobuf.AbstractParser:48
parseDelimitedWithIOException N com.google.protobuf.GeneratedMessageV3:375
parseDelimitedFrom N com.linbit.linstor.proto.responses.MsgErrorReportOuterClass$MsgErrorReport:1219
deserializeErrorReports N com.linbit.linstor.core.apicallhandler.controller.CtrlErrorListApiCallHandler:310
assembleList N com.linbit.linstor.core.apicallhandler.controller.CtrlErrorListApiCallHandler:297
lambda$listErrorReports$11 N com.linbit.linstor.core.apicallhandler.controller.CtrlErrorListApiCallHandler:205
doInScope N com.linbit.linstor.core.apicallhandler.ScopeRunner:147
lambda$fluxInScope$0 N com.linbit.linstor.core.apicallhandler.ScopeRunner:75
call N reactor.core.publisher.MonoCallable:91
trySubscribeScalarMap N reactor.core.publisher.FluxFlatMap:126
subscribeOrReturn N reactor.core.publisher.MonoFlatMapMany:49
subscribe N reactor.core.publisher.Flux:8343
onNext N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
request N reactor.core.publisher.Operators$ScalarSubscription:2344
onSubscribe N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:134
subscribe N reactor.core.publisher.MonoCurrentContext:35
subscribe N reactor.core.publisher.Flux:8357
onNext N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:188
complete N reactor.core.publisher.Operators$MonoSubscriber:1782
onComplete N reactor.core.publisher.MonoCollectList$MonoCollectListSubscriber:121
onComplete N reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner:252
checkTerminated N reactor.core.publisher.FluxFlatMap$FlatMapMain:838
drainLoop N reactor.core.publisher.FluxFlatMap$FlatMapMain:600
innerComplete N reactor.core.publisher.FluxFlatMap$FlatMapMain:909
onComplete N reactor.core.publisher.FluxFlatMap$FlatMapInner:1013
onComplete N reactor.core.publisher.FluxMap$MapSubscriber:136
onComplete N reactor.core.publisher.Operators$MultiSubscriptionSubscriber:2016
complete N reactor.core.publisher.Operators:135
subscribe N reactor.core.publisher.FluxEmpty:41
subscribe N reactor.core.publisher.Flux:8357
onError N reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber:97
onError N reactor.core.publisher.Operators$MultiSubscriptionSubscriber:2021
error N reactor.core.publisher.FluxCreate$BaseSink:452
drain N reactor.core.publisher.FluxCreate$BufferAsyncSink:781
error N reactor.core.publisher.FluxCreate$BufferAsyncSink:726
drainLoop N reactor.core.publisher.FluxCreate$SerializedSink:229
drain N reactor.core.publisher.FluxCreate$SerializedSink:205
error N reactor.core.publisher.FluxCreate$SerializedSink:181
connectionClosing N com.linbit.linstor.netcom.TcpConnectorPeer:498
closeConnection N com.linbit.linstor.netcom.TcpConnectorService:1006
closeAllConnections N com.linbit.linstor.netcom.TcpConnectorService:1040
uninitialize N com.linbit.linstor.netcom.TcpConnectorService:1278
run N com.linbit.linstor.netcom.TcpConnectorService:783
run N java.lang.Thread:834
END OF ERROR REPORT.
Ah it is just started working again out of blue sky
Are you really sure that you can now show now the same error report that did not work earlier?
There might be an issue if there is a Report that is quite large... I think the problematic limit is around 16MiB? (in one ErrorReport)
Well, as you can see I was trying to get output of the list of errors. Possible the list was too big, but it is started working after a minute, not sure for what reason. :)
I'll try to investigate this if the problem will repeat
The interesting fact that after this error, controller lost connections with the all satellites:
13:54:20.644 [TaskScheduleService] INFO LINSTOR/Controller - SYSTEM - RetryTask: Failed resource 'one-vm-8738-disk-0' of node 'm13c43' removed from retry.
14:03:23.041 [grizzly-http-server-1] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'one-vm-11373-disk-2' created.
14:39:27.536 [SslConnector] ERROR LINSTOR/Controller - SYSTEM - Target decrypted buffer is too small! [Report number 60194D5A-00000-000001]
15:20:48.626 [grizzly-http-server-1] ERROR LINSTOR/Controller - SYSTEM - Resource 'one-vm-10468-disk-2' is still in use. [Report number 60194D5A-00000-000002]
16:01:30.482 [grizzly-http-server-0] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
16:01:44.462 [grizzly-http-server-0] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
16:01:54.151 [grizzly-http-server-1] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
16:02:05.017 [grizzly-http-server-1] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
while linstor n l
shows all nodes in Online
but linstor sp l
reurns many errors:
┊ thindata ┊ m6c19 ┊ LVM_THIN ┊ data/thindata ┊ ┊ ┊ True ┊ Warning ┊
┊ thindata ┊ m6c2 ┊ LVM_THIN ┊ data/thindata ┊ ┊ ┊ True ┊ Warning ┊
┊ thindata ┊ m6c20 ┊ LVM_THIN ┊ data/thindata ┊ ┊ ┊ True ┊ Warning ┊
┊ thindata ┊ m6c3 ┊ LVM_THIN ┊ data/thindata ┊ ┊ ┊ True ┊ Warning ┊
┊ thindata ┊ m6c4 ┊ LVM_THIN ┊ data/thindata ┊ ┊ ┊ True ┊ Warning ┊
┊ thindata ┊ m6c5 ┊ LVM_THIN ┊ data/thindata ┊ ┊ ┊ True ┊ Warning ┊
┊ thindata ┊ m6c6 ┊ LVM_THIN ┊ data/thindata ┊ ┊ ┊ True ┊ Warning ┊
┊ thindata ┊ m6c7 ┊ LVM_THIN ┊ data/thindata ┊ ┊ ┊ True ┊ Warning ┊
[...]
WARNING:
Description:
No active connection to satellite 'm10c18'
Details:
The controller is trying to (re-) establish a connection to the satellite. The controller stored the changes and as soon the satellite is connected, it will receive this update.
WARNING:
Description:
No active connection to satellite 'm10c19'
Details:
The controller is trying to (re-) establish a connection to the satellite. The controller stored the changes and as soon the satellite is connected, it will receive this update.
WARNING:
Description:
No active connection to satellite 'm10c2'
Details:
The controller is trying to (re-) establish a connection to the satellite. The controller stored the changes and as soon the satellite is connected, it will receive this update.
[...]
and auto-placing is not working:
# linstor rd auto-place one-vm-10242-disk-0
ERROR:
Description:
Not enough available nodes
Details:
Not enough nodes fulfilling the following auto-place criteria:
* has a deployed storage pool named TransactionList [thindata]
* the storage pools have to have at least '31457280' free space
* the current access context has enough privileges to use the node and the storage pool
* the node is online
Auto-place configuration details:
Additional place count: 1
Replicas on different nodes: TransactionList [Aux/moonshot]
Replicas on same nodes: TransactionList [Aux/opennebula-1]
Don't place with resource (List): [one-vm-10242-disk-0]
Storage pool name: TransactionList [thindata]
Auto-placing resource: one-vm-10242-disk-0
# linstor err show 60194D5A-00000-000001
ERROR REPORT 60194D5A-00000-000001
============================================================
Application: LINBIT�� LINSTOR
Module: Controller
Version: 1.11.0
Build ID: 3367e32d0fa92515efe61f6963767700a8701d98
Build time: 2020-12-22T17:52:40+00:00
Error time: 2021-02-02 14:39:27
Node: linstor-controller-58d8c7956-hpswn
============================================================
Reported error:
===============
Category: Error
Class name: ImplementationError
Class canonical name: com.linbit.ImplementationError
Generated at: Method 'unwrapInto', Source file 'SslTcpConnectorPeer.java', Line #241
Error message: Target decrypted buffer is too small!
Call backtrace:
Method Native Class:Line number
unwrapInto N com.linbit.linstor.netcom.ssl.SslTcpConnectorPeer:241
readData N com.linbit.linstor.netcom.ssl.SslTcpConnectorPeer:287
read N com.linbit.linstor.netcom.ssl.SslTcpConnectorPeer:195
run N com.linbit.linstor.netcom.TcpConnectorService:550
run N java.lang.Thread:834
END OF ERROR REPORT.
I'm going to reopen this issue, because today situation repeated:
14:21:42.157 [SslConnector] ERROR LINSTOR/Controller - SYSTEM - Target decrypted buffer is too small! [Report number 601AB05C-00000-000003]
14:22:05.460 [grizzly-http-server-1] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-024ada25-8d28-4407-bb98-b7ada2900ac2' created.
14:22:05.545 [grizzly-http-server-0] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:23:09.856 [grizzly-http-server-0] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-9cec1356-83e4-48ae-a0d1-152735bbec9f' created.
14:23:09.937 [grizzly-http-server-1] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:23:09.938 [grizzly-http-server-0] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:27:06.453 [grizzly-http-server-0] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-024ada25-8d28-4407-bb98-b7ada2900ac2' created.
14:27:06.519 [grizzly-http-server-1] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:28:10.757 [grizzly-http-server-0] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-9cec1356-83e4-48ae-a0d1-152735bbec9f' created.
14:28:10.769 [grizzly-http-server-1] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-4469d552-b2f2-4373-9eb7-684b4439c1a0' created.
14:28:10.838 [grizzly-http-server-1] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:28:10.839 [grizzly-http-server-0] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:32:07.405 [grizzly-http-server-0] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-024ada25-8d28-4407-bb98-b7ada2900ac2' created.
14:32:07.484 [grizzly-http-server-0] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:33:11.633 [grizzly-http-server-0] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-9cec1356-83e4-48ae-a0d1-152735bbec9f' created.
14:33:11.655 [grizzly-http-server-1] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-4469d552-b2f2-4373-9eb7-684b4439c1a0' created.
14:33:11.729 [grizzly-http-server-0] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:33:11.730 [grizzly-http-server-1] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:37:08.403 [grizzly-http-server-1] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-024ada25-8d28-4407-bb98-b7ada2900ac2' created.
14:37:08.478 [grizzly-http-server-0] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:38:12.564 [grizzly-http-server-1] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-4469d552-b2f2-4373-9eb7-684b4439c1a0' created.
14:38:12.582 [grizzly-http-server-0] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-9cec1356-83e4-48ae-a0d1-152735bbec9f' created.
14:38:12.638 [grizzly-http-server-1] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:38:12.657 [grizzly-http-server-0] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:40:43.433 [grizzly-http-server-0] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-9cec1356-83e4-48ae-a0d1-152735bbec9f' created.
14:40:43.470 [grizzly-http-server-1] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-4469d552-b2f2-4373-9eb7-684b4439c1a0' created.
14:40:43.560 [grizzly-http-server-1] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-024ada25-8d28-4407-bb98-b7ada2900ac2' created.
14:40:43.570 [grizzly-http-server-0] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:40:43.603 [grizzly-http-server-1] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:40:43.670 [grizzly-http-server-1] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:40:45.438 [grizzly-http-server-0] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-9cec1356-83e4-48ae-a0d1-152735bbec9f' created.
14:40:45.497 [grizzly-http-server-1] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-4469d552-b2f2-4373-9eb7-684b4439c1a0' created.
14:40:45.499 [grizzly-http-server-0] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:40:45.517 [grizzly-http-server-1] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-024ada25-8d28-4407-bb98-b7ada2900ac2' created.
14:40:45.567 [grizzly-http-server-0] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:40:45.579 [grizzly-http-server-0] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:40:48.314 [grizzly-http-server-1] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-9cec1356-83e4-48ae-a0d1-152735bbec9f' created.
14:40:48.363 [grizzly-http-server-0] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:40:48.391 [grizzly-http-server-1] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-024ada25-8d28-4407-bb98-b7ada2900ac2' created.
14:40:48.401 [grizzly-http-server-0] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-4469d552-b2f2-4373-9eb7-684b4439c1a0' created.
14:40:48.528 [grizzly-http-server-0] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:40:48.586 [grizzly-http-server-1] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:40:52.975 [grizzly-http-server-1] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:40:53.103 [grizzly-http-server-0] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-9cec1356-83e4-48ae-a0d1-152735bbec9f' created.
14:40:53.137 [grizzly-http-server-0] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:40:53.201 [grizzly-http-server-1] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-024ada25-8d28-4407-bb98-b7ada2900ac2' created.
14:40:53.289 [grizzly-http-server-1] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:41:01.633 [grizzly-http-server-1] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:41:01.700 [grizzly-http-server-1] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:41:01.713 [grizzly-http-server-1] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-4469d552-b2f2-4373-9eb7-684b4439c1a0' created.
14:41:01.800 [grizzly-http-server-0] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:41:18.356 [grizzly-http-server-1] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-9cec1356-83e4-48ae-a0d1-152735bbec9f' created.
14:41:18.390 [grizzly-http-server-1] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-024ada25-8d28-4407-bb98-b7ada2900ac2' created.
14:41:18.400 [grizzly-http-server-0] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:41:18.429 [grizzly-http-server-0] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-4469d552-b2f2-4373-9eb7-684b4439c1a0' created.
14:41:18.523 [grizzly-http-server-0] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:41:18.536 [grizzly-http-server-0] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:41:51.312 [grizzly-http-server-0] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-9cec1356-83e4-48ae-a0d1-152735bbec9f' created.
14:41:51.323 [grizzly-http-server-1] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-4469d552-b2f2-4373-9eb7-684b4439c1a0' created.
14:41:51.386 [grizzly-http-server-0] INFO LINSTOR/Controller - SYSTEM - New volume definition with number '0' of resource definition 'pvc-024ada25-8d28-4407-bb98-b7ada2900ac2' created.
14:41:51.387 [grizzly-http-server-1] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:41:51.411 [grizzly-http-server-0] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
14:41:51.438 [grizzly-http-server-0] ERROR LINSTOR/Controller - SYSTEM - Not enough available nodes
Tus after the Target decrypted buffer is too small
error all storage-pools are become to Warning state:
┊ thindata ┊ m14c43 ┊ LVM_THIN ┊ data/thindata ┊ ┊ ┊ True ┊ Warning ┊
┊ thindata ┊ m14c44 ┊ LVM_THIN ┊ data/thindata ┊ ┊ ┊ True ┊ Warning ┊
┊ thindata ┊ m14c45 ┊ LVM_THIN ┊ data/thindata ┊ ┊ ┊ True ┊ Warning ┊
┊ thindata ┊ m14c5 ┊ LVM_THIN ┊ data/thindata ┊ ┊ ┊ True ┊ Warning ┊
┊ thindata ┊ m14c6 ┊ LVM_THIN ┊ data/thindata ┊ ┊ ┊ True ┊ Warning ┊
┊ thindata ┊ m14c7 ┊ LVM_THIN ┊ data/thindata ┊ ┊ ┊ True ┊ Warning ┊
┊ thindata ┊ m14c8 ┊ LVM_THIN ┊ data/thindata ┊ ┊ ┊ True ┊ Warning ┊
┊ thindata ┊ m14c9 ┊ LVM_THIN ┊ data/thindata ┊ ┊ ┊ True ┊ Warning ┊
┊ thindata ┊ m15c1 ┊ LVM_THIN ┊ data/thindata ┊ ┊ ┊ True ┊ Warning ┊
┊ thindata ┊ m15c10 ┊ LVM_THIN ┊ data/thindata ┊ ┊ ┊ True ┊ Warning ┊
[...]
WARNING:
Description:
No active connection to satellite 'm9c30'
Details:
The controller is trying to (re-) establish a connection to the satellite. The controller stored the changes and as soon the satellite is connected, it will receive this update.
WARNING:
Description:
No active connection to satellite 'm9c31'
Details:
The controller is trying to (re-) establish a connection to the satellite. The controller stored the changes and as soon the satellite is connected, it will receive this update.
WARNING:
Description:
No active connection to satellite 'm9c32'
Details:
The controller is trying to (re-) establish a connection to the satellite. The controller stored the changes and as soon the satellite is connected, it will receive this update.
WARNING:
Description:
No active connection to satellite 'm9c33'
Details:
The controller is trying to (re-) establish a connection to the satellite. The controller stored the changes and as soon the satellite is connected, it will receive this update.
WARNING:
Description:
No active connection to satellite 'm9c34'
Details:
The controller is trying to (re-) establish a connection to the satellite. The controller stored the changes and as soon the satellite is connected, it will receive this update.
placing of new resource is not working anymore.
This is still sometimes happening in 1.19.0.
The controller starts complaining about broken connections to satellites in the linstor sp l
view. At the same time, all nodes are Online
.
WARNING:
Description:
No active connection to satellite '...'
Details:
The controller is trying to (re-) establish a connection to the satellite. The controller stored the changes and as soon the satellite is connected, it will receive this update.
Error message and report:
17:21:55.973 [SslConnector] ERROR LINSTOR/Controller - SYSTEM - Target decrypted buffer is too small! [Report number 63287A29-00000-000000]
ERROR REPORT 63287A29-00000-000000
============================================================
Application: LINBIT® LINSTOR
Module: Controller
Version: 1.19.0
Build ID: 6c838b3897e5a00cff808f7c6402ff3b17aa3c1c
Build time: 2022-07-20T06:46:57+00:00
Error time: 2022-09-19 17:21:55
Node: ...
============================================================
Reported error:
===============
Category: Error
Class name: ImplementationError
Class canonical name: com.linbit.ImplementationError
Generated at: Method 'unwrapInto', Source file 'SslTcpConnectorPeer.java', Line #241
Error message: Target decrypted buffer is too small!
Call backtrace:
Method Native Class:Line number
unwrapInto N com.linbit.linstor.netcom.ssl.SslTcpConnectorPeer:241
readData N com.linbit.linstor.netcom.ssl.SslTcpConnectorPeer:287
read N com.linbit.linstor.netcom.ssl.SslTcpConnectorPeer:195
run N com.linbit.linstor.netcom.TcpConnectorService:543
run N java.lang.Thread:750
END OF ERROR REPORT.
OS and JVM:
# cat /etc/redhat-release
AlmaLinux release 8.6 (Sky Tiger)
# java -version
openjdk version "1.8.0_332"
OpenJDK Runtime Environment (build 1.8.0_332-b09)
OpenJDK 64-Bit Server VM (build 25.332-b09, mixed mode)
We've noticed that this error becomes much easier to reproduce if something is regularly polling the /metrics
endpoint on the controller.
Restarting the controller usually fixes this issue right away.
We still have this problem on 1.23.0 controller. The same situation
# linstor error-reports show 64BF89A7-00000-000006
ERROR REPORT 64BF89A7-00000-000006
============================================================
Application: LINBIT�� LINSTOR
Module: Controller
Version: 1.23.0
Build ID: 28dbd33ced60d75a2a0562bf5e9bc6b800ae8361
Build time: 2023-06-25T16:48:26+00:00
Error time: 2023-07-25 08:46:45
Node: linstor-controller-76cbd795cf-5d4zp
============================================================
Reported error:
===============
Category: Error
Class name: ImplementationError
Class canonical name: com.linbit.ImplementationError
Generated at: Method 'unwrapInto', Source file 'SslTcpConnectorPeer.java', Line #241
Error message: Target decrypted buffer is too small!
Call backtrace:
Method Native Class:Line number
unwrapInto N com.linbit.linstor.netcom.ssl.SslTcpConnectorPeer:241
readData N com.linbit.linstor.netcom.ssl.SslTcpConnectorPeer:287
read N com.linbit.linstor.netcom.ssl.SslTcpConnectorPeer:195
run N com.linbit.linstor.netcom.TcpConnectorService:543
run N java.lang.Thread:829
END OF ERROR REPORT.
And some faulty resources (unknown state). Restart of linstor controller had no effect.
08:45:50.261 [MainWorkerPool-4] INFO LINSTOR/Controller - SYSTEM - RetryTask: Failed resource 'pvc-6b14a3ba-fb4a-4085-9c76-5264d9fc7b6a' of node 'ekb99-a-dhw05' removed from retry.
08:45:50.506 [MainWorkerPool-3] INFO LINSTOR/Controller - SYSTEM - RetryTask: Failed resource 'pvc-dd629fb3-18e5-4f90-8d6c-43eea1df1a78' of node 'ekb99-a-dhw08' removed from retry.
08:45:50.555 [MainWorkerPool-5] INFO LINSTOR/Controller - SYSTEM - RetryTask: Failed resource 'pvc-324463d7-70b4-4def-a1fb-b800afd2aaa8' of node 'ekb99-a-dhw06' removed from retry.
08:46:45.245 [SslConnector] ERROR LINSTOR/Controller - SYSTEM - Target decrypted buffer is too small! [Report number 64BF89A7-00000-000006]
08:48:25.058 [grizzly-http-server-0] WARN LINSTOR/Controller - SYSTEM - Resource definition: pvc-5ba9f49f-8159-4713-96bc-83c14a8ddcc3, Volume number: 0 not found.
08:48:25.098 [grizzly-http-server-7] ERROR LINSTOR/Controller - SYSTEM - Resource 'pvc-5ba9f49f-8159-4713-96bc-83c14a8ddcc3' is still in use. [Report number 64BF89A7-00000-000007]
08:50:22.801 [grizzly-http-server-9] WARN LINSTOR/Controller - SYSTEM - Resource definition: pvc-e2b38290-00d6-4b93-8da7-911a8474ad5e, Volume number: 0 not found.
08:50:22.837 [grizzly-http-server-6] ERROR LINSTOR/Controller - SYSTEM - Resource 'pvc-e2b38290-00d6-4b93-8da7-911a8474ad5e' is still in use. [Report number 64BF89A7-00000-000008]
That is strange. I would have expected restarting the controller to "fix" this issue (at least for a while). Can you send me an sos-report? (please find my email in my profile here) Maybe I can find something that could give us a clue.
And as always, if you by any chance find a way to reproduce this issue, that would be very helpful.
Hello! Today we noticed the same behavior in the linstor-controller as when the following error occurs: [SslConnector] ERROR LINSTOR/Controller - SYSTEM - Target decrypted buffer is too small!
Symptoms: On one of the clusters, one of the linstor-satellites is OFFLINE, even though both the pod and the node were rebooted. The output of the linstor node list looked like this:
linstor node list
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node ┊ NodeType ┊ Addresses ┊ State ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ ingress-0 ┊ SATELLITE ┊ 10.5.0.23:3367 (SSL) ┊ Online ┊
┊ ingress-1 ┊ SATELLITE ┊ 10.5.0.29:3367 (SSL) ┊ Online ┊
┊ linstor-controller-fb6b746dd-2pr8f ┊ CONTROLLER ┊ 10.111.1.22:3367 (SSL) ┊ OFFLINE (Auto-eviction: Disabled) ┊
┊ linstor-controller-fb6b746dd-s86dl ┊ CONTROLLER ┊ 10.111.7.241:3367 (SSL) ┊ Online ┊
┊ master-0 ┊ SATELLITE ┊ 10.5.0.20:3367 (SSL) ┊ Online ┊
┊ master-1 ┊ SATELLITE ┊ 10.5.0.21:3367 (SSL) ┊ OFFLINE (Auto-eviction: 2023-10-12 10:36:49) ┊
┊ master-2 ┊ SATELLITE ┊ 10.5.0.22:3367 (SSL) ┊ Online ┊
┊ monitoring-0 ┊ SATELLITE ┊ 10.5.0.15:3367 (SSL) ┊ Online ┊
┊ monitoring-1 ┊ SATELLITE ┊ 10.5.0.16:3367 (SSL) ┊ Online ┊
┊ node-0 ┊ SATELLITE ┊ 10.5.0.24:3367 (SSL) ┊ Online ┊
┊ node-1 ┊ SATELLITE ┊ 10.5.0.25:3367 (SSL) ┊ Online ┊
┊ node-2 ┊ SATELLITE ┊ 10.5.0.26:3367 (SSL) ┊ Online ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
There was nothing unusual in the logs of linstor-satellite:
kubectl -n d8-linstor logs linstor-node-s2wfr
Defaulted container "linstor-satellite" out of: linstor-satellite, kube-rbac-proxy, drbd-prometheus-exporter
time="2023-10-12T10:42:05Z" level=info msg="running k8s-await-election" version=v0.3.1
time="2023-10-12T10:42:05Z" level=info msg="not running with leader election"
time="2023-10-12T10:42:05Z" level=info msg="starting command '/usr/bin/piraeus-entry.sh' with arguments: '[startSatellite]'"
Importing keystore /tmp/tmp.Gwas4bfdMz to /etc/linstor/ssl/keystore.jks...
Entry for alias linstor successfully imported.
Import command completed: 1 entries successfully imported, 0 entries failed or cancelled
Certificate was added to keystore
LINSTOR, Module Satellite
Version: 1.24.2 (60f9b2aa34ad16f7a0eb5ad8c8e306a2616ae061)
Build time: 2023-10-11T07:48:43+00:00
Java Version: 11
Java VM: Ubuntu, Version 11.0.20.1+1-post-Ubuntu-0ubuntu122.04
Operating system: Linux, Version 5.4.0-54-generic
Environment: amd64, 4 processors, 4004 MiB memory reserved for allocations
System components initialization in progress
Loading configuration file "/etc/linstor/linstor_satellite.toml"
10:42:09.346 [main] INFO LINSTOR/Satellite - SYSTEM - ErrorReporter DB first time init.
10:42:09.348 [main] INFO LINSTOR/Satellite - SYSTEM - Log directory set to: '/var/log/linstor-satellite'
10:42:11.415 [Main] INFO LINSTOR/Satellite - SYSTEM - Loading API classes started.
10:42:11.829 [Main] INFO LINSTOR/Satellite - SYSTEM - API classes loading finished: 414ms
10:42:11.830 [Main] INFO LINSTOR/Satellite - SYSTEM - Dependency injection started.
10:42:11.852 [Main] INFO LINSTOR/Satellite - SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.modularcrypto.FipsCryptoModule"
10:42:11.853 [Main] INFO LINSTOR/Satellite - SYSTEM - Extension module "com.linbit.linstor.modularcrypto.FipsCryptoModule" is not installed
10:42:11.853 [Main] INFO LINSTOR/Satellite - SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.modularcrypto.JclCryptoModule"
10:42:11.860 [Main] INFO LINSTOR/Satellite - SYSTEM - Dynamic load of extension module "com.linbit.linstor.modularcrypto.JclCryptoModule" was successful
10:42:12.670 [Main] INFO LINSTOR/Satellite - SYSTEM - Dependency injection finished: 840ms
10:42:12.670 [Main] INFO LINSTOR/Satellite - SYSTEM - Cryptography provider: Using default cryptography module
10:42:14.723 [Main] INFO LINSTOR/Satellite - SYSTEM - Initializing main network communications service
10:42:14.723 [Main] INFO LINSTOR/Satellite - SYSTEM - Starting service instance 'TimerEventService' of type TimerEventService
10:42:14.724 [Main] INFO LINSTOR/Satellite - SYSTEM - Starting service instance 'FileEventService' of type FileEventService
10:42:14.724 [Main] INFO LINSTOR/Satellite - SYSTEM - Starting service instance 'DrbdEventService-1' of type DrbdEventService
10:42:14.727 [Main] INFO LINSTOR/Satellite - SYSTEM - Starting service instance 'DrbdEventPublisher-1' of type DrbdEventPublisher
10:42:14.727 [Main] INFO LINSTOR/Satellite - SYSTEM - Starting service instance 'SnapshotShippingService' of type SnapshotShippingService
10:42:14.727 [Main] INFO LINSTOR/Satellite - SYSTEM - Starting service instance 'BackupShippingL2LService' of type BackupShippingL2LService
10:42:14.727 [Main] INFO LINSTOR/Satellite - SYSTEM - Starting service instance 'BackupShippingS3Service' of type BackupShippingS3Service
10:42:14.727 [Main] INFO LINSTOR/Satellite - SYSTEM - Starting service instance 'DeviceManager' of type DeviceManager
10:42:14.728 [Main] INFO LINSTOR/Satellite - SYSTEM - Starting service instance 'CloneService' of type CloneService
10:42:14.870 [Main] INFO LINSTOR/Satellite - SYSTEM - SSLNetComService started on port /0:0:0:0:0:0:0:0:3367
We noticed that the sos report has outdated DRBD version data (version 9.2.4) for nodes master-0 and master-2 (there was no data for master-1 at all), although at the time of the sos report formation, the version of DRBD was already 9.2.5.
The command linstor sp l returned a connection error to the nodes:
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| StoragePool | Node | Driver | PoolName | FreeCapacity | TotalCapacity | CanSnapshots | State | SharedName |
|=========================================================================================================================================================================|
| DfltDisklessStorPool | ingress-0 | DISKLESS | | | | False | Warning | ingress-0;DfltDisklessStorPool |
| DfltDisklessStorPool | ingress-1 | DISKLESS | | | | False | Warning | ingress-1;DfltDisklessStorPool |
| DfltDisklessStorPool | master-0 | DISKLESS | | | | False | Warning | master-0;DfltDisklessStorPool |
| DfltDisklessStorPool | master-1 | DISKLESS | | | | False | Warning | master-1;DfltDisklessStorPool |
| DfltDisklessStorPool | master-2 | DISKLESS | | | | False | Warning | master-2;DfltDisklessStorPool |
| DfltDisklessStorPool | monitoring-0 | DISKLESS | | | | False | Warning | monitoring-0;DfltDisklessStorPool |
| DfltDisklessStorPool | monitoring-1 | DISKLESS | | | | False | Warning | monitoring-1;DfltDisklessStorPool |
| DfltDisklessStorPool | node-0 | DISKLESS | | | | False | Warning | node-0;DfltDisklessStorPool |
| DfltDisklessStorPool | node-1 | DISKLESS | | | | False | Warning | node-1;DfltDisklessStorPool |
| DfltDisklessStorPool | node-2 | DISKLESS | | | | False | Warning | node-2;DfltDisklessStorPool |
| data | node-0 | LVM | data_project | | | False | Warning | node-0;data |
| data | node-1 | LVM | data_project | | | False | Warning | node-1;data |
| data | node-2 | LVM | data_project | | | False | Warning | node-2;data |
| monitoring | monitoring-0 | LVM | d8_monitoring | | | False | Warning | monitoring-0;monitoring |
| monitoring | monitoring-1 | LVM | d8_monitoring | | | False | Warning | monitoring-1;monitoring |
| thindata | node-0 | LVM_THIN | data_project_thin/thindata | | | True | Warning | node-0;thindata |
| thindata | node-1 | LVM_THIN | data_project_thin/thindata | | | True | Warning | node-1;thindata |
| thindata | node-2 | LVM_THIN | data_project_thin/thindata | | | True | Warning | node-2;thindata |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
WARNING:
Description:
No active connection to satellite 'node-0'
Details:
The controller is trying to (re-) establish a connection to the satellite. The controller stored the changes and as soon the satellite is connected, it will receive this update.
WARNING:
Description:
No active connection to satellite 'node-1'
Details:
The controller is trying to (re-) establish a connection to the satellite. The controller stored the changes and as soon the satellite is connected, it will receive this update.
WARNING:
Description:
No active connection to satellite 'node-2'
Details:
The controller is trying to (re-) establish a connection to the satellite. The controller stored the changes and as soon the satellite is connected, it will receive this update.
WARNING:
Description:
No active connection to satellite 'ingress-0'
Details:
The controller is trying to (re-) establish a connection to the satellite. The controller stored the changes and as soon the satellite is connected, it will receive this update.
WARNING:
Description:
No active connection to satellite 'ingress-1'
Details:
The controller is trying to (re-) establish a connection to the satellite. The controller stored the changes and as soon the satellite is connected, it will receive this update.
WARNING:
Description:
No active connection to satellite 'master-0'
Details:
The controller is trying to (re-) establish a connection to the satellite. The controller stored the changes and as soon the satellite is connected, it will receive this update.
WARNING:
Description:
No active connection to satellite 'master-1'
Details:
The controller is trying to (re-) establish a connection to the satellite. The controller stored the changes and as soon the satellite is connected, it will receive this update.
WARNING:
Description:
No active connection to satellite 'master-2'
Details:
The controller is trying to (re-) establish a connection to the satellite. The controller stored the changes and as soon the satellite is connected, it will receive this update.
WARNING:
Description:
No active connection to satellite 'monitoring-0'
Details:
The controller is trying to (re-) establish a connection to the satellite. The controller stored the changes and as soon the satellite is connected, it will receive this update.
WARNING:
Description:
No active connection to satellite 'monitoring-1'
Details:
The controller is trying to (re-) establish a connection to the satellite. The controller stored the changes and as soon the satellite is connected, it will receive this update.
It appears that the linstor-controller simply stopped collecting any data from the linstor-satellite. There was such an error in the controller logs:
2023_10_12 09:37:19.697 [SslConnector] ERROR LINSTOR/Controller - SYSTEM - Problem of type 'java.lang.NullPointerException' logged to report number 6526B03E-00000-000003
2023_10_12 09:37:19.699 [SslConnector] ERROR reactor.core.publisher.Operators - Operator called default onErrorDropped
reactor.core.Exceptions$ErrorCallbackNotImplemented: com.linbit.linstor.netcom.PeerClosingConnectionException
Caused by: com.linbit.linstor.netcom.PeerClosingConnectionException: null
at com.linbit.linstor.netcom.TcpConnectorPeer.connectionClosing(TcpConnectorPeer.java:526)
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
*__checkpoint ? Fetch thin capacity info
Original Stack Trace:
at com.linbit.linstor.netcom.TcpConnectorPeer.connectionClosing(TcpConnectorPeer.java:526)
at com.linbit.linstor.netcom.TcpConnectorService.closeConnection(TcpConnectorService.java:999)
at com.linbit.linstor.netcom.TcpConnectorService.closeAllConnections(TcpConnectorService.java:1034)
at com.linbit.linstor.netcom.TcpConnectorService.uninitialize(TcpConnectorService.java:1287)
at com.linbit.linstor.netcom.TcpConnectorService.run(TcpConnectorService.java:777)
at java.base/java.lang.Thread.run(Thread.java:829)
2023_10_12 09:37:19.702 [SslConnector] ERROR reactor.core.publisher.Operators - Operator called default onErrorDropped
reactor.core.Exceptions$ErrorCallbackNotImplemented: com.linbit.linstor.netcom.PeerClosingConnectionException
Caused by: com.linbit.linstor.netcom.PeerClosingConnectionException: null
at com.linbit.linstor.netcom.TcpConnectorPeer.connectionClosing(TcpConnectorPeer.java:526)
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
*__checkpoint ? Fetch volume allocated
Original Stack Trace:
at com.linbit.linstor.netcom.TcpConnectorPeer.connectionClosing(TcpConnectorPeer.java:526)
at com.linbit.linstor.netcom.TcpConnectorService.closeConnection(TcpConnectorService.java:999)
at com.linbit.linstor.netcom.TcpConnectorService.closeAllConnections(TcpConnectorService.java:1034)
at com.linbit.linstor.netcom.TcpConnectorService.uninitialize(TcpConnectorService.java:1287)
at com.linbit.linstor.netcom.TcpConnectorService.run(TcpConnectorService.java:777)
at java.base/java.lang.Thread.run(Thread.java:829)
In the error report, there was:
ERROR REPORT 6526B03E-00000-000003
============================================================
Application: LINBIT? LINSTOR
Module: Controller
Version: 1.24.2
Build ID: 60f9b2aa34ad16f7a0eb5ad8c8e306a2616ae061
Build time: 2023-10-11T07:48:43+00:00
Error time: 2023-10-12 09:37:19
Node: linstor-controller-fb6b746dd-s86dl
============================================================
Reported error:
===============
Category: RuntimeException
Class name: NullPointerException
Class canonical name: java.lang.NullPointerException
Generated at: Method 'establishConnection', Source file 'TcpConnectorService.java', Line #961
Call backtrace:
Method Native Class:Line number
establishConnection N com.linbit.linstor.netcom.TcpConnectorService:961
run N com.linbit.linstor.netcom.TcpConnectorService:708
run N java.lang.Thread:829
END OF ERROR REPORT.
Restarting the linstor-controller resolves the issue. After restarting the linstor-controller, all nodes went to Online state, and the command linstor sp l executed without errors.
I have sent sos report to @ghernadi