opensearch-k8s-operator [BUG] OpenSearch operator panics and crashes when adding an OpenSearchISMPolicy

[BUG] OpenSearch operator panics and crashes when adding an OpenSearchISMPolicy

Open nilushancosta opened this issue 9 months ago • 1 comments

What is the bug?

When adding an OpenSearchISMPolicy while the OpenSearch cluster is getting created, the controller panics resulting in a container crash

2024-05-06T18:19:54.202Z	INFO	Reconciling OpensearchISMPolicy	{"controller": "opensearchismpolicy", "controllerGroup": "opensearch.opster.io", "controllerKind": "OpenSearchISMPolicy", "OpenSearchISMPolicy": {"name":"sample-policy","namespace":"test"}, "namespace": "test", "name": "sample-policy", "reconcileID": "adc1b967-662a-42d0-9c17-95e048ad0ad6", "tenant": {"name":"sample-policy","namespace":"test"}}
2024-05-06T18:19:54.279Z	DEBUG	events	error creating opensearch client	{"type": "Warning", "object": {"kind":"OpenSearchISMPolicy","namespace":"test","name":"sample-policy","uid":"abab26b9-2ca0-4882-a167-4cf37994dcb9","apiVersion":"opensearch.opster.io/v1","resourceVersion":"463314"}, "reason": "OpensearchError"}
2024-05-06T18:19:54.284Z	INFO	Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference	{"controller": "opensearchismpolicy", "controllerGroup": "opensearch.opster.io", "controllerKind": "OpenSearchISMPolicy", "OpenSearchISMPolicy": {"name":"sample-policy","namespace":"test"}, "namespace": "test", "name": "sample-policy", "reconcileID": "adc1b967-662a-42d0-9c17-95e048ad0ad6"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x11f2d64]

goroutine 442 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:115 +0x1a4
panic({0x141dec0?, 0x27073d0?})
	/usr/local/go/src/runtime/panic.go:770 +0x124
github.com/Opster/opensearch-k8s-operator/opensearch-operator/opensearch-gateway/services.(*OsClusterClient).GetISMConfig(0x0, {0x18fcd30, 0x4000e77dd0}, {0x4000c5a410?, 0x0?})
	/workspace/opensearch-gateway/services/os_client.go:314 +0x44
github.com/Opster/opensearch-k8s-operator/opensearch-operator/opensearch-gateway/services.PolicyExists({0x18fcd30?, 0x4000e77dd0?}, 0x4001436700?, {0x4000c5a410?, 0x7?})
	/workspace/opensearch-gateway/services/os_ism_service.go:31 +0x4c
github.com/Opster/opensearch-k8s-operator/opensearch-operator/pkg/reconcilers.(*IsmPolicyReconciler).Reconcile(0x40008d5d00)
	/workspace/pkg/reconcilers/ismpolicy.go:159 +0x72c
github.com/Opster/opensearch-k8s-operator/opensearch-operator/controllers.(*OpensearchISMPolicyReconciler).Reconcile(0x400051abe0, {0x18fcd30, 0x4000e77dd0}, {{{0x4001558638, 0x4}, {0x4001558640, 0xd}}})
	/workspace/controllers/opensearchism_controller.go:53 +0x2ec
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x18fcd30?, {0x18fcd30?, 0x4000e77dd0?}, {{{0x4001558638?, 0x1348fc0?}, {0x4001558640?, 0x4000677e08?}}})
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:118 +0x8c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0x400028a640, {0x18fcd68, 0x400051b630}, {0x149b600, 0x40002689e0})
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:314 +0x294
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0x400028a640, {0x18fcd68, 0x400051b630})
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:265 +0x198
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:226 +0x74
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 129
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222 +0x404

The operator pod will crash several times and then continue running.

How can one reproduce the bug?

Install the operator

helm install opensearch-operator opensearch-operator/opensearch-operator --version 2.6.0 -n test

Create an OpenSearch cluster using kubectl apply. This is the cluster definition I used

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: my-first-cluster
  namespace: test
spec:
  general:
    serviceName: my-first-cluster
    version: 2.11.1
  dashboards:
    enable: false
    version: 2.11.1
    replicas: 0
  nodePools:
    - component: nodes
      replicas: 3
      diskSize: "5Gi"
      nodeSelector:
      resources:
         requests:
            memory: "1Gi"
            cpu: "500m"
         limits:
            memory: "1Gi"
            cpu: "500m"
      roles:
        - "cluster_manager"
        - "data"

Apply the following ISM policy using kubectl apply

apiVersion: opensearch.opster.io/v1
kind: OpenSearchISMPolicy
metadata:
   name: sample-policy
   namespace: test
spec:
   opensearchCluster:
      name: my-first-cluster
   description: Sample policy
   policyId: sample-policy
   defaultState: hot
   states:
      - name: hot
        actions:
           - replicaCount:
                numberOfReplicas: 4
        transitions:
           - stateName: warm
             conditions:
                minIndexAge: "10d"
      - name: warm
        actions:
           - replicaCount:
                numberOfReplicas: 2
        transitions:
           - stateName: delete
             conditions:
                minIndexAge: "30d"
      - name: delete
        actions:
           - delete: {}

At this point, the operator pod would exit with an error

What is the expected behavior?

EXpected the ISM Policy to be added without an issue

What is your host/environment?

Kubernetes 1.25 OpenSearch 2.11.1 OpenSearch operator 2.6.0

Do you have any screenshots?

If applicable, add screenshots to help explain your problem.

Do you have any additional context?

If I do step 2 above and wait for the OpenSearch cluster to complete getting created (i.e. the 3 nodes come to a running state and the cluster health is green) and then do step 3 (add ISM policy), the panic does not happen. But if I do step 3 immediately after step 2, then the operator panics and crashes several times and.

However, when using deployment pipelines, we cannot control the delay between resources

May 06 '24 08:05 nilushancosta

Hi @nilushancosta. Thanks for reporting this. This is clearly a bug and the operator should just wait if the cluster is not yet correctly reachable.

May 07 '24 12:05 swoehrl-mw

opensearch-k8s-operator opensearch-k8s-operator copied to clipboard

[BUG] OpenSearch operator panics and crashes when adding an OpenSearchISMPolicy

What is the bug?

How can one reproduce the bug?

What is the expected behavior?

What is your host/environment?

Do you have any screenshots?

Do you have any additional context?

opensearch-k8s-operator
opensearch-k8s-operator copied to clipboard