opensearch-k8s-operator
opensearch-k8s-operator copied to clipboard
OpenSearch upgrade to v2.x stuck midway
Hi Team
I have upgraded the Opensearch cluster to v2.2.1 (from v1.3.1) and looks like the Operator did upgraded the Data Nodes and Dashboard, but masters are stuck at v1.3.1
- Operator re-launched both master & data pod along with dashboard pod
Here's some more details
Statefulsets:
❯ kubectl get sts dev-opensearch-logging-cluster-masters -oyaml | egrep -A2 -i 'image|upgradeStrategy'
image: docker.io/opensearchproject/opensearch:1.3.1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 10
--
image: public.ecr.aws/opsterio/busybox:1.27.2-buildx
imagePullPolicy: IfNotPresent
name: init
resources: {}
❯ kubectl get sts dev-opensearch-logging-cluster-nodes -oyaml | egrep -A2 -i 'image|upgradeStrategy'
image: docker.io/opensearchproject/opensearch:2.2.1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 10
--
image: public.ecr.aws/opsterio/busybox:1.27.2-buildx
imagePullPolicy: IfNotPresent
name: init
resources: {}
~ ❯
OpenSearch cluster (seems like stuck on upgrading)
Name: dev-opensearch-logging-cluster
Namespace: dev-opensearch
Labels: app=dev-opensearch-logging-cluster
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/version=2.1.0
helm.sh/chart=opensearch-cluster-1.0.0
Annotations: meta.helm.sh/release-name: dev-opensearch
meta.helm.sh/release-namespace: default
API Version: opensearch.opster.io/v1
Kind: OpenSearchCluster
Metadata:
Creation Timestamp: 2022-09-05T05:07:28Z
Finalizers:
Opster
Generation: 13
Managed Fields:
API Version: opensearch.opster.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:finalizers:
.:
v:"Opster":
f:spec:
f:bootstrap:
.:
f:resources:
f:dashboards:
f:opensearchCredentialsSecret:
f:tls:
f:caSecret:
f:secret:
f:security:
f:tls:
f:http:
f:caSecret:
f:secret:
f:transport:
f:caSecret:
f:secret:
Manager: manager
Operation: Update
Time: 2022-09-05T05:07:28Z
API Version: opensearch.opster.io/v1
Fields Type: FieldsV1
fieldsV1:
f:status:
.:
f:componentsStatus:
f:initialized:
f:phase:
f:version:
Manager: manager
Operation: Update
Subresource: status
Time: 2022-09-05T05:15:46Z
API Version: opensearch.opster.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:meta.helm.sh/release-name:
f:meta.helm.sh/release-namespace:
f:labels:
.:
f:app:
f:app.kubernetes.io/managed-by:
f:app.kubernetes.io/version:
f:helm.sh/chart:
f:spec:
.:
f:confMgmt:
.:
f:smartScaler:
f:dashboards:
.:
f:enable:
f:replicas:
f:resources:
.:
f:limits:
.:
f:cpu:
f:memory:
f:requests:
.:
f:cpu:
f:memory:
f:tls:
.:
f:enable:
f:generate:
f:version:
f:general:
.:
f:httpPort:
f:pluginsList:
f:serviceName:
f:vendor:
f:version:
f:nodePools:
f:security:
.:
f:tls:
.:
f:http:
.:
f:generate:
f:transport:
.:
f:generate:
f:perNode:
Manager: helm
Operation: Update
Time: 2022-09-07T11:10:50Z
Resource Version: 8122795
UID: 62b8456d-abec-4625-bb85-3e9f4a99b9fa
Spec:
Bootstrap:
Resources:
Conf Mgmt:
Smart Scaler: true
Dashboards:
Enable: true
Opensearch Credentials Secret:
Replicas: 1
Resources:
Limits:
Cpu: 500m
Memory: 2Gi
Requests:
Cpu: 500m
Memory: 2Gi
Tls:
Ca Secret:
Enable: true
Generate: true
Secret:
Version: 2.2.1
General:
Http Port: 9200
Plugins List:
repository-s3
Service Name: dev-opensearch-logging-cluster
Vendor: opensearch
Version: 2.2.1
Node Pools:
Component: masters
Disk Size: 3Gi
Persistence:
Pvc:
Access Modes:
ReadWriteOnce
Storage Class: aws-ebs-standard-persistent
Replicas: 3
Resources:
Limits:
Cpu: 1000m
Memory: 2Gi
Requests:
Cpu: 500m
Memory: 2Gi
Roles:
master
Component: nodes
Disk Size: 20Gi
Persistence:
Pvc:
Access Modes:
ReadWriteOnce
Storage Class: aws-ebs-standard-persistent
Replicas: 3
Resources:
Limits:
Cpu: 500m
Memory: 2Gi
Requests:
Cpu: 500m
Memory: 2Gi
Roles:
data
Security:
Tls:
Http:
Ca Secret:
Generate: true
Secret:
Transport:
Ca Secret:
Generate: true
Per Node: true
Secret:
Status:
Components Status:
Component: Upgrader
Description: nodes
Status: Upgrading
Component: Upgrader
Description: nodes
Status: Upgrading
Initialized: true
Phase: RUNNING
Version: 1.3.1
Events: <none>
Now, I tried to manually update statefulset to update the tag to v2.2.1 but operator controller manager seems to be reverting it and syncing it with the change it has.
1.6631431563239067e+09 DEBUG controller.opensearchcluster resource diff {"reconciler group": "opensearch.opster.io", "reconciler kind": "OpenSearchCluster", "name": "dev-opensearch-logging-cluster", "namespace": "dev-opensearch", "reconciler": "cluster", "name": "dev-opensearch-logging-cluster-masters", "namespace": "dev-opensearch", "apiVersion": "apps/v1", "kind": "StatefulSet", "patch": "{\"spec\":{\"template\":{\"spec\":{\"$setElementOrder/containers\":[{\"name\":\"opensearch\"}],\"containers\":[{\"image\":\"docker.io/opensearchproject/opensearch:1.3.1\",\"name\":\"opensearch\"}]}}}}"}
1.6631431563244636e+09 DEBUG controller.opensearchcluster updating resource {"reconciler group": "opensearch.opster.io", "reconciler kind": "OpenSearchCluster", "name": "dev-opensearch-logging-cluster", "namespace": "dev-opensearch", "reconciler": "cluster", "name": "dev-opensearch-logging-cluster-masters", "namespace": "dev-opensearch", "apiVersion": "apps/v1", "kind": "StatefulSet"}
0 repository-s3
1.663143156337469e+09 DEBUG controller.opensearchcluster resource updated {"reconciler group": "opensearch.opster.io", "reconciler kind": "OpenSearchCluster", "name": "dev-opensearch-logging-cluster", "namespace": "dev-opensearch", "reconciler": "cluster", "name": "dev-opensearch-logging-cluster-masters", "namespace": "dev-opensearch", "apiVersion": "apps/v1", "kind": "StatefulSet"}
I am kind of out of option here on how to get this sorted out. Any help would be appreciated.
FYI - I started the upgrade to v2.2.0 initially and when this got stuck, thought of pushing v.2.2.1 to see if it moved anything.
Hi @ghiya-arpit. I'm not the most knowledgable in regards to the upgrade component of the operator, but this looks like the operator is still waiting for the data nodes to finish their upgrade. Can you check the status of the data nodes statefulset, if updatedReplicas
is set to 3?
Now, I tried to manually update statefulset to update the tag to v2.2.1 but operator controller manager seems to be reverting it and syncing it with the change it has.
That is correct, manual changes of the objects are not possible, the operator will always overwrite with the state configured via custom resource.
I tried to reproduce your problem with a local cluster, but couldn't. I started a cluster on 1.3.1, added sample data, then updated the versions to 2.2.1. After a few minutes all pods were recreated and had the new image.
@ghiya-arpit One more thing: Can you try while also changing the role from "master" to "cluster_manager" during the upgrade?
Closing, as there was no further response from the issue reporter.