opensearch-k8s-operator
opensearch-k8s-operator copied to clipboard
The admin password cannot be changed while the cluster is running
I deploy the cluster with the following two files opensearch-cluster.yaml
apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
name: cluster
spec:
general:
version: 2.2.0
httpPort: 9200
vendor: opensearch
serviceName: cluster
dashboards:
version: 2.2.0
enable: true
replicas: 2
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "500m"
confMgmt:
smartScaler: true
security:
config:
securityConfigSecret:
name: securityconfig-secret
adminCredentialsSecret:
name: admin-credentials-secret
tls:
transport:
generate: true
http:
generate: true
nodePools:
- component: masters
replicas: 3
diskSize: "50Gi"
NodeSelector:
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "500m"
roles:
- "cluster_manager"
- component: nodes
replicas: 3
diskSize: "30Gi"
NodeSelector:
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "500m"
roles:
- "data"
---
apiVersion: v1
kind: Secret
metadata:
name: admin-credentials-secret
type: Opaque
data:
# admin
username: YWRtaW4=
# admin
password: YWRtaW4=
securityconfig-secret.yaml is default . Initial user is admin:admin
everything works fine
In order to change the admin password to admin123 while the cluster is running, I did the following
- securityconfig-secret.yaml set the hash of admin password
- opensearch-cluster.yaml set the base64 of admin password in admin-credentials-secret
- After 1 and 2 are set up, the operator will not actively reconcile because the cr file has not changed(personal understanding),So I regenerated a securityconfig secret, the name was changed to securityconfig-secret-new, and the securityConfigSecret.name in the cr file was also changed to securityconfig-secret-new
- reapply
opensearch-cluster.yaml
apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
name: cluster
spec:
general:
version: 2.2.0
httpPort: 9200
vendor: opensearch
serviceName: cluster
dashboards:
version: 2.2.0
enable: true
replicas: 2
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "500m"
confMgmt:
smartScaler: true
security:
config:
securityConfigSecret:
name: securityconfig-secret-new
adminCredentialsSecret:
name: admin-credentials-secret
tls:
transport:
generate: true
http:
generate: true
nodePools:
- component: masters
replicas: 3
diskSize: "50Gi"
NodeSelector:
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "500m"
roles:
- "cluster_manager"
- component: nodes
replicas: 3
diskSize: "30Gi"
NodeSelector:
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "500m"
roles:
- "data"
---
apiVersion: v1
kind: Secret
metadata:
name: admin-credentials-secret
type: Opaque
data:
# admin
username: YWRtaW4=
# admin123
password: YWRtaW4xMjM=
securityconfig-secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: securityconfig-secret-new
type: Opaque
stringData:
action_groups.yml: |-
_meta:
type: "actiongroups"
config_version: 2
internal_users.yml: |-
_meta:
type: "internalusers"
config_version: 2
admin:
hash: "$2y$12$5wpCOrAcmunb5Jy3NDgfteEv2JY6uDt1jNdTFXvEdH/TpeL/LpopK"
reserved: true
backend_roles:
- "admin"
description: "Demo admin user"
.....
The result is strange , there is always one node that is not restarted and the status is 0/1
kubectl -n os logs -f pod/cluster-masters-0
[2022-09-28T03:38:37,497][INFO ][o.o.c.s.ClusterApplierService] [cluster-masters-0] cluster-manager node changed {previous [{cluster-masters-1}{5zr_CnuNRbC_0Ltimik4WQ}{Cf3h2VgPQwy0sy5a0bTLYw}{cluster-masters-1}{10.10.14.39:9300}{m}{
shard_indexing_pressure_enabled=true}], current []}, term: 2, version: 56, reason: becoming candidate: onLeaderFailure
[2022-09-28T03:38:37,498][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [cluster-masters-0] Detected cluster change event for destination migration
[2022-09-28T03:38:37,672][WARN ][o.o.c.NodeConnectionsService] [cluster-masters-0] failed to connect to {cluster-masters-1}{5zr_CnuNRbC_0Ltimik4WQ}{Cf3h2VgPQwy0sy5a0bTLYw}{cluster-masters-1}{10.10.14.39:9300}{m}{shard_indexing_press
ure_enabled=true} (tried [1] times)
org.opensearch.transport.ConnectTransportException: [cluster-masters-1][10.10.14.39:9300] connect_exception
at org.opensearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1076) ~[opensearch-2.2.0.jar:2.2.0]
at org.opensearch.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:215) ~[opensearch-2.2.0.jar:2.2.0]
at org.opensearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:55) ~[opensearch-core-2.2.0.jar:2.2.0]
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[?:?]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[?:?]
at org.opensearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:70) ~[opensearch-core-2.2.0.jar:2.2.0]
at org.opensearch.transport.netty4.Netty4TcpChannel.lambda$addListener$0(Netty4TcpChannel.java:81) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:321) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:337) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:623) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:586) ~[?:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[?:?]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: cluster-masters-1/10.10.14.39:9300
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
at sun.nio.ch.Net.pollConnectNow(Net.java:672) ~[?:?]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946) ~[?:?]
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) ~[?:?]
... 7 more
[2022-09-28T03:38:38,269][INFO ][o.o.c.s.MasterService ] [cluster-masters-0] elected-as-cluster-manager ([2] nodes joined)[{cluster-masters-0}{yXF2C00OSPe6GZWqZp5DCg}{hoJxXzTFTDKkaIar_lNOiw}{cluster-masters-0}{10.10.50.44:9300}{m
}{shard_indexing_pressure_enabled=true} elect leader, {cluster-masters-2}{V8QgnGeYTyqo8lZ6DSlLdw}{1XPncI5AQ7eiQ2bTxngiOg}{cluster-masters-2}{10.10.240.33:9300}{m}{shard_indexing_pressure_enabled=true} elect leader, _BECOME_CLUSTER_M
ANAGER_TASK_, _FINISH_ELECTION_], term: 3, version: 57, delta: cluster-manager node changed {previous [], current [{cluster-masters-0}{yXF2C00OSPe6GZWqZp5DCg}{hoJxXzTFTDKkaIar_lNOiw}{cluster-masters-0}{10.10.50.44:9300}{m}{shard_ind
exing_pressure_enabled=true}]}
[2022-09-28T03:38:38,271][WARN ][o.o.c.s.MasterService ] [cluster-masters-0] failing [elected-as-cluster-manager ([2] nodes joined)[{cluster-masters-0}{yXF2C00OSPe6GZWqZp5DCg}{hoJxXzTFTDKkaIar_lNOiw}{cluster-masters-0}{10.10.50.4
4:9300}{m}{shard_indexing_pressure_enabled=true} elect leader, {cluster-masters-2}{V8QgnGeYTyqo8lZ6DSlLdw}{1XPncI5AQ7eiQ2bTxngiOg}{cluster-masters-2}{10.10.240.33:9300}{m}{shard_indexing_pressure_enabled=true} elect leader, _BECOME_
CLUSTER_MANAGER_TASK_, _FINISH_ELECTION_]]: failed to commit cluster state version [57]
org.opensearch.cluster.coordination.FailedToCommitClusterStateException: node is no longer cluster-manager for term 3 while handling publication
at org.opensearch.cluster.coordination.Coordinator.publish(Coordinator.java:1256) ~[opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.service.MasterService.publish(MasterService.java:339) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:321) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:196) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:176) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:214) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:282) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:245) [opensearch-2.2.0.jar:2.2.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
[2022-09-28T03:38:38,276][INFO ][o.o.c.c.JoinHelper ] [cluster-masters-0] failed to join {cluster-masters-0}{yXF2C00OSPe6GZWqZp5DCg}{hoJxXzTFTDKkaIar_lNOiw}{cluster-masters-0}{10.10.50.44:9300}{m}{shard_indexing_pressure_enabl
ed=true} with JoinRequest{sourceNode={cluster-masters-0}{yXF2C00OSPe6GZWqZp5DCg}{hoJxXzTFTDKkaIar_lNOiw}{cluster-masters-0}{10.10.50.44:9300}{m}{shard_indexing_pressure_enabled=true}, minimumTerm=2, optionalJoin=Optional[Join{term=3
, lastAcceptedTerm=2, lastAcceptedVersion=56, sourceNode={cluster-masters-0}{yXF2C00OSPe6GZWqZp5DCg}{hoJxXzTFTDKkaIar_lNOiw}{cluster-masters-0}{10.10.50.44:9300}{m}{shard_indexing_pressure_enabled=true}, targetNode={cluster-masters-
0}{yXF2C00OSPe6GZWqZp5DCg}{hoJxXzTFTDKkaIar_lNOiw}{cluster-masters-0}{10.10.50.44:9300}{m}{shard_indexing_pressure_enabled=true}}]}
org.opensearch.transport.RemoteTransportException: [cluster-masters-0][10.10.50.44:9300][internal:cluster/coordination/join]
Caused by: org.opensearch.cluster.coordination.FailedToCommitClusterStateException: node is no longer cluster-manager for term 3 while handling publication
at org.opensearch.cluster.coordination.Coordinator.publish(Coordinator.java:1256) ~[opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.service.MasterService.publish(MasterService.java:339) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:321) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:196) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:176) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:214) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:282) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:245) [opensearch-2.2.0.jar:2.2.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
[2022-09-28T03:38:38,735][INFO ][o.o.c.s.MasterService ] [cluster-masters-0] elected-as-cluster-manager ([2] nodes joined)[{cluster-masters-0}{yXF2C00OSPe6GZWqZp5DCg}{hoJxXzTFTDKkaIar_lNOiw}{cluster-masters-0}{10.10.50.44:9300}{m
}{shard_indexing_pressure_enabled=true} elect leader, {cluster-masters-2}{V8QgnGeYTyqo8lZ6DSlLdw}{1XPncI5AQ7eiQ2bTxngiOg}{cluster-masters-2}{10.10.240.33:9300}{m}{shard_indexing_pressure_enabled=true} elect leader, _BECOME_CLUSTER_M
ANAGER_TASK_, _FINISH_ELECTION_], term: 5, version: 57, delta: cluster-manager node changed {previous [], current [{cluster-masters-0}{yXF2C00OSPe6GZWqZp5DCg}{hoJxXzTFTDKkaIar_lNOiw}{cluster-masters-0}{10.10.50.44:9300}{m}{shard_ind
exing_pressure_enabled=true}]}
[2022-09-28T03:38:38,742][INFO ][o.o.c.c.FollowersChecker ] [cluster-masters-0] FollowerChecker{discoveryNode={cluster-masters-1}{5zr_CnuNRbC_0Ltimik4WQ}{Cf3h2VgPQwy0sy5a0bTLYw}{cluster-masters-1}{10.10.14.39:9300}{m}{shard_indexing
_pressure_enabled=true}, failureCountSinceLastSuccess=1, [cluster.fault_detection.follower_check.retry_count]=3} disconnected
org.opensearch.transport.NodeNotConnectedException: [cluster-masters-1][10.10.14.39:9300] Node not connected
at org.opensearch.transport.ClusterConnectionManager.getConnection(ClusterConnectionManager.java:206) ~[opensearch-2.2.0.jar:2.2.0]
at org.opensearch.transport.TransportService.getConnection(TransportService.java:801) ~[opensearch-2.2.0.jar:2.2.0]
at org.opensearch.transport.TransportService.sendRequest(TransportService.java:718) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.coordination.FollowersChecker$FollowerChecker.handleWakeUp(FollowersChecker.java:348) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.coordination.FollowersChecker$FollowerChecker.start(FollowersChecker.java:336) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.coordination.FollowersChecker.lambda$setCurrentNodes$2(FollowersChecker.java:178) [opensearch-2.2.0.jar:2.2.0]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) [?:?]
at java.util.Iterator.forEachRemaining(Iterator.java:133) [?:?]
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1845) [?:?]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) [?:?]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) [?:?]
at java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:310) [?:?]
at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:734) [?:?]
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:762) [?:?]
at org.opensearch.cluster.coordination.FollowersChecker.setCurrentNodes(FollowersChecker.java:171) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.coordination.Coordinator.publish(Coordinator.java:1301) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.service.MasterService.publish(MasterService.java:339) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:321) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:196) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:176) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:214) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:282) [opensearch-2.2.0.jar:2.2.0]
at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:245) [opensearch-2.2.0.jar:2.2.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
[2022-09-28T03:38:38,744][INFO ][o.o.c.c.FollowersChecker ] [cluster-masters-0] FollowerChecker{discoveryNode={cluster-masters-1}{5zr_CnuNRbC_0Ltimik4WQ}{Cf3h2VgPQwy0sy5a0bTLYw}{cluster-masters-1}{10.10.14.39:9300}{m}{shard_indexing
_pressure_enabled=true}, failureCountSinceLastSuccess=1, [cluster.fault_detection.follower_check.retry_count]=3} marking node as faulty
[2022-09-28T03:38:39,018][INFO ][o.o.c.c.JoinHelper ] [cluster-masters-0] failed to join {cluster-masters-2}{V8QgnGeYTyqo8lZ6DSlLdw}{1XPncI5AQ7eiQ2bTxngiOg}{cluster-masters-2}{10.10.240.33:9300}{m}{shard_indexing_pressure_enab
led=true} with JoinRequest{sourceNode={cluster-masters-0}{yXF2C00OSPe6GZWqZp5DCg}{hoJxXzTFTDKkaIar_lNOiw}{cluster-masters-0}{10.10.50.44:9300}{m}{shard_indexing_pressure_enabled=true}, minimumTerm=5, optionalJoin=Optional.empty}
org.opensearch.transport.RemoteTransportException: [cluster-masters-2][10.10.240.33:9300][internal:cluster/coordination/join]
Caused by: org.opensearch.cluster.coordination.CoordinationStateRejectedException: became follower
at org.opensearch.cluster.coordination.JoinHelper$CandidateJoinAccumulator.lambda$close$3(JoinHelper.java:556) ~[opensearch-2.2.0.jar:2.2.0]
at java.util.HashMap$Values.forEach(HashMap.java:1065) ~[?:?]
at org.opensearch.cluster.coordination.JoinHelper$CandidateJoinAccumulator.close(JoinHelper.java:556) ~[opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.coordination.Coordinator.becomeFollower(Coordinator.java:738) ~[opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.coordination.Coordinator.onFollowerCheckRequest(Coordinator.java:340) ~[opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.coordination.FollowersChecker$2.doRun(FollowersChecker.java:228) ~[opensearch-2.2.0.jar:2.2.0]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:806) ~[opensearch-2.2.0.jar:2.2.0]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.2.0.jar:2.2.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
[2022-09-28T03:38:39,069][INFO ][o.o.c.c.JoinHelper ] [cluster-masters-0] failed to join {cluster-masters-2}{V8QgnGeYTyqo8lZ6DSlLdw}{1XPncI5AQ7eiQ2bTxngiOg}{cluster-masters-2}{10.10.240.33:9300}{m}{shard_indexing_pressure_enab
led=true} with JoinRequest{sourceNode={cluster-masters-0}{yXF2C00OSPe6GZWqZp5DCg}{hoJxXzTFTDKkaIar_lNOiw}{cluster-masters-0}{10.10.50.44:9300}{m}{shard_indexing_pressure_enabled=true}, minimumTerm=3, optionalJoin=Optional[Join{term=
4, lastAcceptedTerm=2, lastAcceptedVersion=56, sourceNode={cluster-masters-0}{yXF2C00OSPe6GZWqZp5DCg}{hoJxXzTFTDKkaIar_lNOiw}{cluster-masters-0}{10.10.50.44:9300}{m}{shard_indexing_pressure_enabled=true}, targetNode={cluster-masters
-2}{V8QgnGeYTyqo8lZ6DSlLdw}{1XPncI5AQ7eiQ2bTxngiOg}{cluster-masters-2}{10.10.240.33:9300}{m}{shard_indexing_pressure_enabled=true}}]}
org.opensearch.transport.RemoteTransportException: [cluster-masters-2][10.10.240.33:9300][internal:cluster/coordination/join]
Caused by: org.opensearch.cluster.coordination.FailedToCommitClusterStateException: node is no longer cluster-manager for term 4 while handling publication
at org.opensearch.cluster.coordination.Coordinator.publish(Coordinator.java:1256) ~[opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.service.MasterService.publish(MasterService.java:339) ~[opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:321) ~[opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:196) ~[opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:176) ~[opensearch-2.2.0.jar:2.2.0]
at org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:214) ~[opensearch-2.2.0.jar:2.2.0]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) ~[opensearch-2.2.0.jar:2.2.0]
at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:282) ~[opensearch-2.2.0.jar:2.2.0]
at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:245) ~[opensearch-2.2.0.jar:2.2.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
[2022-09-28T03:38:39,580][INFO ][o.o.c.s.ClusterApplierService] [cluster-masters-0] cluster-manager node changed {previous [], current [{cluster-masters-0}{yXF2C00OSPe6GZWqZp5DCg}{hoJxXzTFTDKkaIar_lNOiw}{cluster-masters-0}{10.10.50.
44:9300}{m}{shard_indexing_pressure_enabled=true}]}, term: 5, version: 57, reason: Publication{term=5, version=57}
[2022-09-28T03:38:39,587][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [cluster-masters-0] Detected cluster change event for destination migration
[2022-09-28T03:38:39,587][INFO ][o.o.i.i.ManagedIndexCoordinator] [cluster-masters-0] Cache cluster manager node onClusterManager time: 1664336319587
[2022-09-28T03:38:39,776][INFO ][o.o.c.s.MasterService ] [cluster-masters-0] node-left[{cluster-masters-1}{5zr_CnuNRbC_0Ltimik4WQ}{Cf3h2VgPQwy0sy5a0bTLYw}{cluster-masters-1}{10.10.14.39:9300}{m}{shard_indexing_pressure_enabled=tr
ue} reason: disconnected], term: 5, version: 58, delta: removed {{cluster-masters-1}{5zr_CnuNRbC_0Ltimik4WQ}{Cf3h2VgPQwy0sy5a0bTLYw}{cluster-masters-1}{10.10.14.39:9300}{m}{shard_indexing_pressure_enabled=true}}
[2022-09-28T03:38:40,031][INFO ][o.o.c.s.ClusterApplierService] [cluster-masters-0] removed {{cluster-masters-1}{5zr_CnuNRbC_0Ltimik4WQ}{Cf3h2VgPQwy0sy5a0bTLYw}{cluster-masters-1}{10.10.14.39:9300}{m}{shard_indexing_pressure_enabled
=true}}, term: 5, version: 58, reason: Publication{term=5, version=58}
[2022-09-28T03:38:40,033][INFO ][o.o.a.c.ADClusterEventListener] [cluster-masters-0] Cluster node changed, node removed: true, node added: false
[2022-09-28T03:38:40,033][INFO ][o.o.a.c.HashRing ] [cluster-masters-0] Node removed: [5zr_CnuNRbC_0Ltimik4WQ]
[2022-09-28T03:38:40,033][INFO ][o.o.a.c.ADClusterEventListener] [cluster-masters-0] Hash ring build result: true
[2022-09-28T03:38:40,033][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [cluster-masters-0] Detected cluster change event for destination migration
[2022-09-28T03:38:40,717][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [cluster-masters-0] Detected cluster change event for destination migration
[2022-09-28T03:38:42,025][INFO ][o.o.c.r.a.AllocationService] [cluster-masters-0] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.opendistro_security][0]]]).
[2022-09-28T03:38:42,711][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [cluster-masters-0] Detected cluster change event for destination migration
[2022-09-28T03:38:42,954][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [cluster-masters-0] Detected cluster change event for destination migration
[2022-09-28T03:38:44,506][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [cluster-masters-0] Detected cluster change event for destination migration
[2022-09-28T03:39:07,490][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 127.0.0.1:53318
[2022-09-28T03:39:08,671][INFO ][o.o.c.s.ClusterSettings ] [cluster-masters-0] updating [cluster.routing.allocation.enable] from [all] to [primaries]
[2022-09-28T03:39:08,672][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [cluster-masters-0] Detected cluster change event for destination migration
[2022-09-28T03:39:09,298][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:39:09,527][INFO ][o.o.c.c.FollowersChecker ] [cluster-masters-0] FollowerChecker{discoveryNode={cluster-nodes-1}{37p1XkjBS0WUDwgzySo5Vg}{-wJsZAFAQcK7XVHv5cZNLA}{cluster-nodes-1}{10.10.50.45:9300}{d}{shard_indexing_pre
ssure_enabled=true}, failureCountSinceLastSuccess=0, [cluster.fault_detection.follower_check.retry_count]=3} disconnected
[2022-09-28T03:39:09,528][INFO ][o.o.c.c.FollowersChecker ] [cluster-masters-0] FollowerChecker{discoveryNode={cluster-nodes-1}{37p1XkjBS0WUDwgzySo5Vg}{-wJsZAFAQcK7XVHv5cZNLA}{cluster-nodes-1}{10.10.50.45:9300}{d}{shard_indexing_pre
ssure_enabled=true}, failureCountSinceLastSuccess=0, [cluster.fault_detection.follower_check.retry_count]=3} marking node as faulty
[2022-09-28T03:39:09,531][INFO ][o.o.c.r.a.AllocationService] [cluster-masters-0] updating number_of_replicas to [1] for indices [.opendistro_security]
[2022-09-28T03:39:09,533][INFO ][o.o.c.s.MasterService ] [cluster-masters-0] node-left[{cluster-nodes-1}{37p1XkjBS0WUDwgzySo5Vg}{-wJsZAFAQcK7XVHv5cZNLA}{cluster-nodes-1}{10.10.50.45:9300}{d}{shard_indexing_pressure_enabled=true}
reason: disconnected], term: 5, version: 64, delta: removed {{cluster-nodes-1}{37p1XkjBS0WUDwgzySo5Vg}{-wJsZAFAQcK7XVHv5cZNLA}{cluster-nodes-1}{10.10.50.45:9300}{d}{shard_indexing_pressure_enabled=true}}
[2022-09-28T03:39:09,689][INFO ][o.o.c.s.ClusterApplierService] [cluster-masters-0] removed {{cluster-nodes-1}{37p1XkjBS0WUDwgzySo5Vg}{-wJsZAFAQcK7XVHv5cZNLA}{cluster-nodes-1}{10.10.50.45:9300}{d}{shard_indexing_pressure_enabled=tru
e}}, term: 5, version: 64, reason: Publication{term=5, version=64}
[2022-09-28T03:39:09,689][INFO ][o.o.a.c.ADClusterEventListener] [cluster-masters-0] Cluster node changed, node removed: true, node added: false
[2022-09-28T03:39:09,690][INFO ][o.o.a.c.HashRing ] [cluster-masters-0] Node removed: [37p1XkjBS0WUDwgzySo5Vg]
[2022-09-28T03:39:09,690][INFO ][o.o.a.c.HashRing ] [cluster-masters-0] Remove data node from AD version hash ring: 37p1XkjBS0WUDwgzySo5Vg
[2022-09-28T03:39:09,690][INFO ][o.o.a.c.ADClusterEventListener] [cluster-masters-0] Hash ring build result: true
[2022-09-28T03:39:09,690][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [cluster-masters-0] Detected cluster change event for destination migration
[2022-09-28T03:39:09,691][INFO ][o.o.c.r.DelayedAllocationService] [cluster-masters-0] scheduling reroute for delayed shards in [59.8s] (2 delayed shards)
[2022-09-28T03:39:09,703][WARN ][o.o.c.r.a.AllocationService] [cluster-masters-0] [security-auditlog-2022.09.28][0] marking unavailable shards as stale: [_VCPtw-pRQqRGbp5zXe23Q]
[2022-09-28T03:39:09,771][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [cluster-masters-0] Detected cluster change event for destination migration
[2022-09-28T03:39:11,714][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:39:14,215][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:39:16,714][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:39:19,212][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:39:21,715][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:39:24,275][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:39:26,718][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:39:29,280][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:39:31,772][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:39:34,289][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:39:34,794][WARN ][o.o.c.r.a.AllocationService] [cluster-masters-0] [.opendistro_security][0] marking unavailable shards as stale: [_VgsgWdkQrmZeoEAsVy8TQ]
[2022-09-28T03:39:34,876][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [cluster-masters-0] Detected cluster change event for destination migration
[2022-09-28T03:39:36,805][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:39:37,579][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 127.0.0.1:53538
[2022-09-28T03:39:38,401][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 127.0.0.1:53582
[2022-09-28T03:39:38,690][INFO ][o.o.c.s.MasterService ] [cluster-masters-0] node-join[{cluster-masters-1}{5zr_CnuNRbC_0Ltimik4WQ}{1faTHkFeRtK3U1Z5sSb1gg}{cluster-masters-1}{10.10.14.40:9300}{m}{shard_indexing_pressure_enabled=tr
ue} join existing leader], term: 5, version: 67, delta: added {{cluster-masters-1}{5zr_CnuNRbC_0Ltimik4WQ}{1faTHkFeRtK3U1Z5sSb1gg}{cluster-masters-1}{10.10.14.40:9300}{m}{shard_indexing_pressure_enabled=true}}
[2022-09-28T03:39:39,296][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:39:39,588][INFO ][o.o.i.i.ManagedIndexCoordinator] [cluster-masters-0] Performing move cluster state metadata.
[2022-09-28T03:39:39,589][INFO ][o.o.i.i.MetadataService ] [cluster-masters-0] ISM config index not exist, so we cancel the metadata migration job.
[2022-09-28T03:39:41,720][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:39:44,289][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:39:44,450][INFO ][o.o.c.s.ClusterApplierService] [cluster-masters-0] added {{cluster-masters-1}{5zr_CnuNRbC_0Ltimik4WQ}{1faTHkFeRtK3U1Z5sSb1gg}{cluster-masters-1}{10.10.14.40:9300}{m}{shard_indexing_pressure_enabled=t
rue}}, term: 5, version: 67, reason: Publication{term=5, version=67}
[2022-09-28T03:39:44,450][INFO ][o.o.a.c.ADClusterEventListener] [cluster-masters-0] Cluster node changed, node removed: false, node added: true
[2022-09-28T03:39:44,450][INFO ][o.o.a.c.HashRing ] [cluster-masters-0] Node added: [5zr_CnuNRbC_0Ltimik4WQ]
[2022-09-28T03:39:44,451][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [cluster-masters-0] Detected cluster change event for destination migration
[2022-09-28T03:39:44,455][INFO ][o.o.a.c.HashRing ] [cluster-masters-0] All nodes with known AD version: {FO_xvmbESZmRtTMvXYQ4qQ=ADNodeInfo{version=2.2.0, isEligibleDataNode=true}, 5zr_CnuNRbC_0Ltimik4WQ=ADNodeInfo{version=2
.2.0, isEligibleDataNode=false}, wQZ7icbpTzKmgRYfUQavbQ=ADNodeInfo{version=2.2.0, isEligibleDataNode=true}, yXF2C00OSPe6GZWqZp5DCg=ADNodeInfo{version=2.2.0, isEligibleDataNode=false}, V8QgnGeYTyqo8lZ6DSlLdw=ADNodeInfo{version=2.2.0,
isEligibleDataNode=false}}
[2022-09-28T03:39:44,455][INFO ][o.o.a.c.ADClusterEventListener] [cluster-masters-0] Hash ring build result: true
[2022-09-28T03:39:46,779][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:39:49,275][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:39:51,775][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:39:54,281][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:39:56,788][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:39:59,283][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:40:01,780][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:40:04,280][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:40:06,792][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:40:07,490][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 127.0.0.1:53846
[2022-09-28T03:40:09,275][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:40:09,408][INFO ][o.o.c.r.a.AllocationService] [cluster-masters-0] updating number_of_replicas to [2] for indices [.opendistro_security]
[2022-09-28T03:40:09,409][INFO ][o.o.c.s.MasterService ] [cluster-masters-0] node-join[{cluster-nodes-1}{37p1XkjBS0WUDwgzySo5Vg}{48xg9cu6SUq7yHoMXTr5AQ}{cluster-nodes-1}{10.10.50.47:9300}{d}{shard_indexing_pressure_enabled=true}
join existing leader], term: 5, version: 68, delta: added {{cluster-nodes-1}{37p1XkjBS0WUDwgzySo5Vg}{48xg9cu6SUq7yHoMXTr5AQ}{cluster-nodes-1}{10.10.50.47:9300}{d}{shard_indexing_pressure_enabled=true}}
[2022-09-28T03:40:11,782][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:40:14,281][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:40:14,601][INFO ][o.o.c.s.ClusterApplierService] [cluster-masters-0] added {{cluster-nodes-1}{37p1XkjBS0WUDwgzySo5Vg}{48xg9cu6SUq7yHoMXTr5AQ}{cluster-nodes-1}{10.10.50.47:9300}{d}{shard_indexing_pressure_enabled=true}
}, term: 5, version: 68, reason: Publication{term=5, version=68}
[2022-09-28T03:40:14,601][INFO ][o.o.a.c.ADClusterEventListener] [cluster-masters-0] Cluster node changed, node removed: false, node added: true
[2022-09-28T03:40:14,601][INFO ][o.o.a.c.HashRing ] [cluster-masters-0] Node added: [37p1XkjBS0WUDwgzySo5Vg]
[2022-09-28T03:40:14,601][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [cluster-masters-0] Detected cluster change event for destination migration
[2022-09-28T03:40:14,672][INFO ][o.o.a.c.HashRing ] [cluster-masters-0] Add data node to AD version hash ring: 37p1XkjBS0WUDwgzySo5Vg
[2022-09-28T03:40:14,673][INFO ][o.o.a.c.HashRing ] [cluster-masters-0] All nodes with known AD version: {37p1XkjBS0WUDwgzySo5Vg=ADNodeInfo{version=2.2.0, isEligibleDataNode=true}, FO_xvmbESZmRtTMvXYQ4qQ=ADNodeInfo{version=2
.2.0, isEligibleDataNode=true}, 5zr_CnuNRbC_0Ltimik4WQ=ADNodeInfo{version=2.2.0, isEligibleDataNode=false}, wQZ7icbpTzKmgRYfUQavbQ=ADNodeInfo{version=2.2.0, isEligibleDataNode=true}, yXF2C00OSPe6GZWqZp5DCg=ADNodeInfo{version=2.2.0,
isEligibleDataNode=false}, V8QgnGeYTyqo8lZ6DSlLdw=ADNodeInfo{version=2.2.0, isEligibleDataNode=false}}
[2022-09-28T03:40:14,673][INFO ][o.o.a.c.ADClusterEventListener] [cluster-masters-0] Hash ring build result: true
[2022-09-28T03:40:14,681][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [cluster-masters-0] Detected cluster change event for destination migration
[2022-09-28T03:40:16,785][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:40:19,282][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:40:21,783][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:40:24,285][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:40:26,786][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:40:29,282][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:40:31,788][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:40:34,287][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:40:36,787][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:40:37,494][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 127.0.0.1:54112
[2022-09-28T03:40:39,285][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:40:39,589][INFO ][o.o.i.i.ManagedIndexCoordinator] [cluster-masters-0] Cancel background move metadata process.
[2022-09-28T03:40:39,589][INFO ][o.o.i.i.ManagedIndexCoordinator] [cluster-masters-0] Performing move cluster state metadata.
[2022-09-28T03:40:39,589][INFO ][o.o.i.i.MetadataService ] [cluster-masters-0] Move metadata has finished.
[2022-09-28T03:40:41,788][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:40:44,286][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:40:46,789][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:40:49,288][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:40:51,817][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:40:54,291][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:40:56,795][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:40:59,292][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:41:01,788][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:41:04,193][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 127.0.0.1:54306
[2022-09-28T03:41:04,789][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:41:06,793][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:41:07,482][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 127.0.0.1:54318
[2022-09-28T03:41:09,298][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:41:11,792][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:41:14,297][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:41:16,803][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:41:19,296][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:41:21,799][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:41:24,296][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:41:26,799][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:41:29,298][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:41:31,806][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:41:34,302][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:41:36,802][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:41:37,485][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 127.0.0.1:54516
[2022-09-28T03:41:39,297][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:41:41,798][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:41:44,307][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:41:46,804][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:41:49,302][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:41:51,801][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:41:54,301][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:41:56,806][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:41:59,300][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:42:01,800][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:42:04,303][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:42:06,803][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:42:07,496][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 127.0.0.1:54702
[2022-09-28T03:42:08,194][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 127.0.0.1:54726
[2022-09-28T03:42:09,299][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:42:11,802][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:42:14,299][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:42:16,800][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:42:19,301][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:42:21,798][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:42:24,305][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:42:26,801][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:42:29,305][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:42:31,801][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:42:34,304][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:42:36,799][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:42:37,487][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 127.0.0.1:54906
[2022-09-28T03:42:39,300][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:42:41,799][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:42:44,300][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:42:46,803][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:42:49,303][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:42:51,800][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:42:54,298][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:42:56,802][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:42:59,299][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:43:01,804][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:43:04,296][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:43:06,801][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:43:07,480][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 127.0.0.1:55098
[2022-09-28T03:43:09,304][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:43:11,800][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:43:14,301][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:43:16,805][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:43:19,302][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:43:20,592][INFO ][o.o.j.s.JobSweeper ] [cluster-masters-0] Running full sweep
[2022-09-28T03:43:21,805][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:43:24,298][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:43:26,806][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:43:29,303][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:43:31,800][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:43:33,192][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 127.0.0.1:55278
[2022-09-28T03:43:34,299][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
[2022-09-28T03:43:36,801][WARN ][o.o.s.a.BackendRegistry ] [cluster-masters-0] Authentication finally failed for admin from 10.10.14.37:39696
Check the "env" of cluster-masters-1 and cluster-master-2 , OPENSEARCH_PASSWORD=admin123, but cluster-masters-0's env is "OPENSEARCH_PASSWORD=admin"
Why is cluster-masters-0 not restarted?
I am observing the same phenomena in our OpenSearch cluster. Did the same steps and getting the same Result. Would be great to get some help to understand where could be the issue.
Here's the sequence of changes I've tried
- Botostrap a cluster with defaults - Works
- Change the password hash and secret and reapply charts - Works
- Security config pod spins up and reapplies the security config
- At this stage, I was able to curl the cluster with the new credentials
- Next, I could see that operator relaunches master nodes and this is when the nodes start getting the authentication error
Hi @jinchengsix. I managed to reproduce your setup and I think I can explain the cause: Changing the admin password in the secret will not in itself update any of the pods (because the password is used there as an envvar as you pointed out it will not be updated without a pod restart). But changing the securityconfig will update the password for opensearch cluster (btw the operator will pick up any changes in the securityconfig secret without having to rename it and rerun the update job within 30 seconds or so). This leads to the readiness probes on the pods failing due to password mismatch. When the operator performs a rolling restart (in my case I had to trigger that by doing a dummy change to the config) it will only continue if all replicas of the statefulset its working on are ready. But due to the readiness probe failing the not yet updated replicas will become unready and the operator will stall. In your case the time till unready is just long enough for the operator to update all but the first pod.
I believe we will need to add some logic to the operator to deal with changing admin passwords. @prudhvigodithi Do you have ideas? Could we change the readiness probe to take the password not from an envvar but from a file mounted secret? That way a changed password should distribute to all pods automatically.
As I understand the options are either reading the admin credentials from a secret, or change the logic around the readiness prob to allow rolling restart on this case.
In case of reading from a secret, I would still allow using env bars as an alternative, and we need to make sure watching the secret would not cause all pods to restart at once, but gradually rolling.
@idanl21 @prudhvigodithi wdyt?
Hey @swoehrl-mw, didn't fully understood your fix are you suggesting saving the admin's credentials inside some mounted file, and creating a reconciler that watches secrets all the time, and when a secret has changed it will change the credentials inside the file and will take care of the rolling restart?
@idanl21 My idea was to add a volumeMount for the admin-password
secret to the pods of the statefulset (e.g. under /mnt/admin-credentials
). And then rewrite the probe command to curl -k -u \"$(cat /mnt/admin-credentials/username):$(cat /mnt/admin-credentials/password)\" --silent --fail https://localhost:9200
so that the password is retrieved from the file-mounted secret each time.
That way if a user changes the admin password and updates the secret, kubernetes will automatically propagate the change to the mounts so the next time the probe runs it gets the new password. This way no restart is needed as changes to file-mounted secrets propagate (handled by kubernetes) but envvars do not.
Hey @swoehrl-mw that would work as well, right the password change is accepted at cluster startup updating securityConfigSecret
adminCredentialsSecret
configs, I have labeled this as bug, we should come up with a solution to make sure the password is done in a rolling way without cluster disruption.
@prudhvigodithi @idanl21 @segalziv I have implemented my proposed solution in PR #358. Please check it out, test it and provide your opinion.
A solution to this problem was implemented in the linked PR and merged to main. Closing this issue as solved.