piraeus-operator icon indicating copy to clipboard operation
piraeus-operator copied to clipboard

Linstor-Controller crashing

Open dimm0 opened this issue 2 years ago • 7 comments

Error:

root@linstor-controller-5557d9ccb4-dffn8:/# linstor error-reports show 64C941EE-00000-000004
ERROR REPORT 64C941EE-00000-000004

============================================================

Application:                        LINBIT�� LINSTOR
Module:                             Controller
Version:                            1.23.0
Build ID:                           28dbd33ced60d75a2a0562bf5e9bc6b800ae8361
Build time:                         2023-05-23T06:27:14+00:00
Error time:                         2023-08-01 17:35:50
Node:                               linstor-controller-5557d9ccb4-dffn8

============================================================

Reported error:
===============

Category:                           Error
Class name:                         ImplementationError
Class canonical name:               com.linbit.ImplementationError
Generated at:                       Method 'run', Source file 'SpaceTrackingTask.java', Line #300

Error message:                      Uncaught exception in k

Call backtrace:

    Method                                   Native Class:Line number
    run                                      N      com.linbit.linstor.spacetracking.k:300
    run                                      N      java.lang.Thread:829

Caused by:
==========

Category:                           RuntimeException
Class name:                         NullPointerException
Class canonical name:               java.lang.NullPointerException
Generated at:                       Method 'a', Source file 'SpaceTrackingApiCallHandler.java', Line #108


Call backtrace:

    Method                                   Native Class:Line number
    a                                        N      com.linbit.linstor.core.apicallhandler.controller.internal.a:108
    a                                        N      com.linbit.linstor.core.apicallhandler.controller.internal.a:80
    a                                        N      com.linbit.linstor.spacetracking.k:884
    c                                        N      com.linbit.linstor.spacetracking.k:548
    run                                      N      com.linbit.linstor.spacetracking.k:269
    run                                      N      java.lang.Thread:829


END OF ERROR REPORT.

Operator version 2.1.1

dimm0 avatar Aug 01 '23 17:08 dimm0

Please open an issue over at https://github.com/linbit/linstor-server

Does the issue happen right at start up? If not, have you tried restarting the Pod?

WanzenBug avatar Aug 02 '23 06:08 WanzenBug

Ok will do It happens in a couple minutes after the start up. Then pod crashes, and tries to start again.

dimm0 avatar Aug 02 '23 15:08 dimm0

There's a fix in 1.24 How can I update the controller version in the operator? Add a patch for the controller deployment?

dimm0 avatar Aug 09 '23 06:08 dimm0

You can edit the piraeus-operator-image-config ConfigMap which holds the image information. You need to change the linstor-satellite and linstor-controller tag.

WanzenBug avatar Aug 09 '23 06:08 WanzenBug

By the way, the original issue was only with the livenessprobe for the SpaceTracking service, you could go back to 1.23.0 and patch the deployment to remove the livenessProbe.

Something like this should work:

apiVersion: piraeus.io/v1
kind: LinstorCluster
metadata:
  name: linstorcluster
spec:
    - target:
        kind: Deployment
        name: linstor-controller
      patch: |
        apiVersion: apps/v1
        kind: Deployment
        metadata:
          name: linstor-controller
        spec:
          template:
            spec:
              containers:
              - name: linstor-controller
                startupProbe:
                  $patch: delete
                livenessProbe:
                  $patch: delete

WanzenBug avatar Aug 09 '23 06:08 WanzenBug

I did that, there's a 2nd one...

https://github.com/LINBIT/linstor-server/issues/364#issuecomment-1664512738

dimm0 avatar Aug 09 '23 06:08 dimm0

Can't patch too:

manager 2023-08-11T00:13:58Z    ERROR    Reconciler error    {"controller": "linstorcluster", "controllerGroup": "piraeus.io", "controllerKind": "LinstorCluster", "LinstorCluster": {"name":"linstorcluster"}, "namespace": "", "name": "linstorcluster", "reconcileID": "7e38a128-3355-4a8c-b13d-e00b7d8e7e1c", "error": "Deployment.apps \"linstor-controller\" is invalid: spec.template.spec.containers[0].livenessProbe: Required value: must specify a handler type"}

dimm0 avatar Aug 11 '23 00:08 dimm0