piraeus-operator
piraeus-operator copied to clipboard
Linstor-Controller crashing
Error:
root@linstor-controller-5557d9ccb4-dffn8:/# linstor error-reports show 64C941EE-00000-000004
ERROR REPORT 64C941EE-00000-000004
============================================================
Application: LINBIT�� LINSTOR
Module: Controller
Version: 1.23.0
Build ID: 28dbd33ced60d75a2a0562bf5e9bc6b800ae8361
Build time: 2023-05-23T06:27:14+00:00
Error time: 2023-08-01 17:35:50
Node: linstor-controller-5557d9ccb4-dffn8
============================================================
Reported error:
===============
Category: Error
Class name: ImplementationError
Class canonical name: com.linbit.ImplementationError
Generated at: Method 'run', Source file 'SpaceTrackingTask.java', Line #300
Error message: Uncaught exception in k
Call backtrace:
Method Native Class:Line number
run N com.linbit.linstor.spacetracking.k:300
run N java.lang.Thread:829
Caused by:
==========
Category: RuntimeException
Class name: NullPointerException
Class canonical name: java.lang.NullPointerException
Generated at: Method 'a', Source file 'SpaceTrackingApiCallHandler.java', Line #108
Call backtrace:
Method Native Class:Line number
a N com.linbit.linstor.core.apicallhandler.controller.internal.a:108
a N com.linbit.linstor.core.apicallhandler.controller.internal.a:80
a N com.linbit.linstor.spacetracking.k:884
c N com.linbit.linstor.spacetracking.k:548
run N com.linbit.linstor.spacetracking.k:269
run N java.lang.Thread:829
END OF ERROR REPORT.
Operator version 2.1.1
Please open an issue over at https://github.com/linbit/linstor-server
Does the issue happen right at start up? If not, have you tried restarting the Pod?
Ok will do It happens in a couple minutes after the start up. Then pod crashes, and tries to start again.
There's a fix in 1.24 How can I update the controller version in the operator? Add a patch for the controller deployment?
You can edit the piraeus-operator-image-config ConfigMap which holds the image information. You need to change the linstor-satellite and linstor-controller tag.
By the way, the original issue was only with the livenessprobe for the SpaceTracking service, you could go back to 1.23.0 and patch the deployment to remove the livenessProbe.
Something like this should work:
apiVersion: piraeus.io/v1
kind: LinstorCluster
metadata:
name: linstorcluster
spec:
- target:
kind: Deployment
name: linstor-controller
patch: |
apiVersion: apps/v1
kind: Deployment
metadata:
name: linstor-controller
spec:
template:
spec:
containers:
- name: linstor-controller
startupProbe:
$patch: delete
livenessProbe:
$patch: delete
I did that, there's a 2nd one...
https://github.com/LINBIT/linstor-server/issues/364#issuecomment-1664512738
Can't patch too:
manager 2023-08-11T00:13:58Z ERROR Reconciler error {"controller": "linstorcluster", "controllerGroup": "piraeus.io", "controllerKind": "LinstorCluster", "LinstorCluster": {"name":"linstorcluster"}, "namespace": "", "name": "linstorcluster", "reconcileID": "7e38a128-3355-4a8c-b13d-e00b7d8e7e1c", "error": "Deployment.apps \"linstor-controller\" is invalid: spec.template.spec.containers[0].livenessProbe: Required value: must specify a handler type"}