opensearch-k8s-operator
opensearch-k8s-operator copied to clipboard
Operator > 2.0.0 fails to create Bootstrap Pod
Using the following ResourceDefinition
---
apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
name: our-cluster
namespace: our-namespace
spec:
general:
serviceName: open-search
version: 2.2.0
setVMMaxMapCount: true
security:
tls:
transport:
generate: true
perNode: true
http:
generate: true
dashboards:
enable: true
tls:
enable: true
generate: true
version: 2.2.0
replicas: 1
resources:
requests:
memory: "512Mi"
cpu: "200m"
limits:
memory: "512Mi"
cpu: "200m"
nodePools:
- component: open-search-nodes
replicas: 3
diskSize: "5Gi"
nodeSelector:
cloud.google.com/gke-spot: "true"
tolerations:
- key: cloud.google.com/gke-spot
operator: Equal
value: "true"
effect: NoSchedule
resources:
requests:
memory: "2Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "500m"
persistence:
pvc:
storageClass: standard
accessModes:
- ReadWriteOnce
roles:
- "data"
- "master"
and the 2.0.3, 2.0.2 and 2.0.1 operator, the bootstrap pod isn't created, and the Operator fails to create a working OpenSearch cluster. Using the 2.0.0 operator, the bootstrap-pod is created successfully, however, the 2.0.0 operator does not support OpenSearch 2.x
fixed in the latest release, try to take a look
I can confirm that the bootstrap pod is now created as expected. Thanks!
Hi @idanl21 / @edwardsmit : I am trying to run the same manifest (without the tolerations/nodeselectors) locally in my kind cluster. Updated the operator helm to 2.0.4 version. But still could not see the boostrap pod. Could you please help me understand if I am missing something here?
@madhukarmmallia-plivo You're right. I was deploying a 1.3.2 version of OpenSearch (replace both 2.2.0 values with 1.3.2 in the above manifest). Deploying a brand new 2.2.0 with above manifest still does not work. First deploying a 1.3.2 cluster and then upgrade it to 2.2.0 does work.
Thanks for confirming @edwardsmit . I tried some debugging from my end. Looks like v2+ of opensearch is not picking up 'cluster_manager' node role which is assigned to the bootstrap instance of opensearch.
@idanl21 Can we reopen this issue?
@madhukarmmallia-plivo I also did some checking. The bootstrap pod is only created if cluster status is not initialized (https://github.com/Opster/opensearch-k8s-operator/blob/main/opensearch-operator/pkg/reconcilers/cluster.go#L81) and that is determined based on if all master pods are ready (https://github.com/Opster/opensearch-k8s-operator/blob/main/opensearch-operator/controllers/opensearchController.go#L249). To determine which pods are master pods the operator checks the roles of the pods and there is a switch in the operator code based on the version: For 1.x it will use master
, for 2.x it will use cluster_manager
. Thus if you deploy a 2.x cluster but use role master
the code directly sets Initialized=true
and the bootstrap pod is never started.
I'm not sure if role master
i still valid for opensearch 2.x, if yes we should extend the operator to use both roles as indications of master, if not we need to fix the example yaml and maybe even let the operator warn the user (e.g. in the status or via event) that the master
role cannot be used for a 2.x cluster.
I've just lost half a day because of this. Luckily I came accross this ticket. Using the example and then replacing role 'master' with 'cluster_manager' worked. Inclusion can be hard sometimes...
The solution there is to change master
role to cluster_manager
role in the OS YAML file. a fix PR and an 2.0 cluster YAML file was uploaded and merged