opensearch-k8s-operator icon indicating copy to clipboard operation
opensearch-k8s-operator copied to clipboard

Opensearch operator 2.0.4 does not work with the example opensearch cluster manifest.

Open dobharweim opened this issue 2 years ago • 31 comments

Description

The example opensearch cluster yaml does not work with version 2.0.4 of the operator.

Steps to reproduce

  1. Add the helm repo for the operator: helm repo add opensearch-operator https://opster.github.io/opensearch-k8s-operator/
  2. Install V2.0.4 of the operator: helm install opensearch-operator opensearch-operator/opensearch-operator --version 2.0.4
  3. Wait for the operator pods to be ready. image
  4. Apply the manifest for the example opensearch cluster: kubectl apply -f ~/Git/opensearch-k8s-operator/opensearch-operator/examples/opensearch-cluster.yaml
  5. After ~10 mins check the cluster: image
  6. In this instance the master and one coordinator node are restarting.
  7. Describe master pod to debug: kubectl describe pod my-cluster-masters-0 image my-cluster-masters-0.log
  8. Describe coordinator pod to debug: kubectl describe pod my-cluster-coordinators-1 image my-cluster-coordinators-1.log
  9. Terminated with 137 for both.

Steps to workaround

  1. I have tried doubling memory requests and limits to 4Gi for each node group but I am still getting 137 error and pods killed. Any advice on this would be greatly appreciated - can you supply a working example yaml for the latest images of opensearch also? I think 1.3.0 is 5 months old now. Thanks.

dobharweim avatar Aug 22 '22 13:08 dobharweim

I also see the data nodes' logs are full of SSL/TLS errors. Attaching file here but the errors look like:

[2022-08-22T13:17:45,052][WARN ][o.o.h.AbstractHttpServerTransport] [my-cluster-nodes-0] caught exception while handling client http traffic, closing connection Netty4HttpChannel{localAddress=0.0.0.0/0.0.0.0:9200, remoteAddress=null}
io.netty.handler.codec.DecoderException: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 474554202f20485454502f312e310d0a486f73743a206c6f63616c686f73743a393230300d0a557365722d4167656e743a20537973646967204167656e742f312e300d0a4163636570742d456e636f64696e673a20677a69702c206465666c6174650d0a4163636570743a20746578742f68746d6c2c202a2f2a0d0a436f6e6e656374696f6e3a206b6565702d616c6976650d0a436f6e74656e742d547970653a206170706c69636174696f6e2f782d7777772d666f726d2d75726c656e636f6465640d0a0d0a
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:480) ~[netty-codec-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279) ~[netty-codec-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.ja^C	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:623) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:586) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [netty-common-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.73.Final.jar:4.1.73.Final]
	at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 474554202f20485454502f312e310d0a486f73743a206c6f63616c686f73743a393230300d0a557365722d4167656e743a20537973646967204167656e742f312e300d0a4163636570742d456e636f64696e673a20677a69702c206465666c6174650d0a4163636570743a20746578742f68746d6c2c202a2f2a0d0a436f6e6e656374696f6e3a206b6565702d616c6976650d0a436f6e74656e742d547970653a206170706c69636174696f6e2f782d7777772d666f726d2d75726c656e636f6465640d0a0d0a
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1213) ~[netty-handler-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1283) ~[netty-handler-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510) ~[netty-codec-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:449) ~[netty-codec-4.1.73.Final.jar:4.1.73.Final]
	... 16 more

my-cluster-nodes-0.log.zip

dobharweim avatar Aug 22 '22 14:08 dobharweim

I have recreated the deployment, giving the master nodeset 12Gi Memory requests and limits. Still getting 137. image

dobharweim avatar Aug 22 '22 14:08 dobharweim

you need to change the versions in L8 and L14 to version: 2.2.0

and then check these instructions

These should be added to the documentation.

dickescheid avatar Aug 23 '22 08:08 dickescheid

Hey @dickescheid thanks a million for the feedback. I have followed up on your suggestion but am still seeing failures.

I have removed everything, the old openSearchCluster, the pvcs, the operator etc and then:

  1. Installed the latest version of the operator: helm install opensearch-operator opensearch-operator/opensearch-operator --version 2.0.4
  2. Applied the following two secrets to the default NS:
apiVersion: v1
kind: Secret
metadata:
  name: admin-credentials-secret
type: Opaque
data:
  # admin
  username: YWRtaW4=
  # test
  password: dGVzdA==
apiVersion: v1
kind: Secret
metadata:
  name: securityconfig-secret
type: Opaque
## admin opassword hash for test "$2y$12$B6GMBQIwOUEV2qtBQrpJL.37MUMp1XkLxCyWzeTH5Q94QxNjw8ng6"
stringData:
  internal_users.yml: |-
    _meta:
      type: "internalusers"
      config_version: 2
    admin:
      hash: "$2y$12$tS0wrbNssQpVjOXDPrzqdO5phJC/Fmb9fNKSdJ9P2voGK.LNIqLxG"
      reserved: true
      backend_roles:
      - "admin"
      description: "Demo admin user"
    dashboarduser:
      hash: "$2a$12$4AcgAt3xwOWadA5s5blL6ev39OXDNhmOesEoo33eZtrq2N0YrU3H."
      reserved: true
      description: "Demo OpenSearch Dashboards user"
  1. Applied the following OpenSearchCluster manifest:
apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: my-cluster
  namespace: default
spec:
  security:
    config:
      securityConfigSecret:
        name: securityconfig-secret
      adminCredentialsSecret:
        name: admin-credentials-secret
  general:
    version: 2.2.0
    httpPort: 9200
    vendor: opensearch
    serviceName: my-cluster
    pluginsList: ["repository-s3"," https://github.com/aiven/prometheus-exporter-plugin-for-opensearch/releases/download/2.2.0.0/prometheus-exporter-2.2.0.0.zip"]
  dashboards:
    opensearchCredentialsSecret:
      name: admin-credentials-secret
    version: 2.2.0
    enable: true
    replicas: 2
    resources:
      requests:
        memory: '1Gi'
        cpu: '500m'
      limits:
        memory: '1Gi'
        cpu: '500m'
  confMgmt:
    smartScaler: true
  nodePools:
    - component: masters
      replicas: 3
      diskSize: '30Gi'
      NodeSelector:
      resources:
        requests:
          memory: '2Gi'
          cpu: '500m'
        limits:
          memory: '2Gi'
          cpu: '500m'
      roles:
        - 'master'
        - 'data'
    - component: nodes
      replicas: 3
      diskSize: '30Gi'
      NodeSelector:
      resources:
        requests:
          memory: '2Gi'
          cpu: '500m'
        limits:
          memory: '2Gi'
          cpu: '500m'
      roles:
        - 'data'
    - component: coordinators
      replicas: 3
      diskSize: '30Gi'
      NodeSelector:
      resources:
        requests:
          memory: '2Gi'
          cpu: '500m'
        limits:
          memory: '2Gi'
          cpu: '500m'
      roles:
        - 'ingest'
  1. Still seeing errors: image

The final error I see in the master logs is

[2022-08-23T13:35:22,085][INFO ][o.o.s.a.i.AuditLogImpl ] [my-cluster-masters-0] Closing AuditLogImpl [2022-08-23T13:35:22,084][ERROR][o.o.s.c.ConfigurationLoaderSecurity7] [my-cluster-masters-0] Exception while retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security) org.opensearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized]; at org.opensearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:204) ~[opensearch-2.2.0.jar:2.2.0] at org.opensearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:190) ~[opensearch-2.2.0.jar:2.2.0] at org.opensearch.action.get.TransportMultiGetAction.doExecute(TransportMultiGetAction.java:81) ~[opensearch-2.2.0.jar:2.2.0] at org.opensearch.action.get.TransportMultiGetAction.doExecute(TransportMultiGetAction.java:58) ~[opensearch-2.2.0.jar:2.2.0] at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:218) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.indexmanagement.rollup.actionfilter.FieldCapsFilter.apply(FieldCapsFilter.kt:118) [opensearch-index-management-2.2.0.0.jar:2.2.0.0] at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.security.filter.SecurityFilter.apply0(SecurityFilter.java:232) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.security.filter.SecurityFilter.apply(SecurityFilter.java:149) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.performanceanalyzer.action.PerformanceAnalyzerActionFilter.apply(PerformanceAnalyzerActionFilter.java:78) [opensearch-performance-analyzer-2.2.0.0.jar:2.2.0.0] at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.action.support.TransportAction.execute(TransportAction.java:188) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.action.support.TransportAction.execute(TransportAction.java:107) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:110) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.client.node.NodeClient.doExecute(NodeClient.java:97) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:423) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.client.support.AbstractClient.multiGet(AbstractClient.java:539) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.security.configuration.ConfigurationLoaderSecurity7.loadAsync(ConfigurationLoaderSecurity7.java:207) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.security.configuration.ConfigurationLoaderSecurity7.load(ConfigurationLoaderSecurity7.java:98) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.security.configuration.ConfigurationRepository.getConfigurationsFromIndex(ConfigurationRepository.java:372) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration0(ConfigurationRepository.java:318) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration(ConfigurationRepository.java:303) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.security.configuration.ConfigurationRepository$1.run(ConfigurationRepository.java:163) [opensearch-security-2.2.0.0.jar:2.2.0.0] at java.lang.Thread.run(Thread.java:833) [?:?] [2022-08-23T13:35:22,147][INFO ][o.o.n.Node ] [my-cluster-masters-0] closed Killing performance analyzer process 103 OpenSearch exited with code 143 Performance analyzer exited with code 143

I have attached the logs of the previous master and coordinator pods and the current data node, it hasn't restarted yet.

If you have a running example of this could you provide the:

  1. Version of the operator you are using.
  2. Your full OpenSearchCluster manifest to compare against mine? I cannot see from the documentation what I am doing wrong. We would love to use and contribute back to the operator. Thanks a million.

dobharweim avatar Aug 23 '22 13:08 dobharweim

I first deployed a cluster it like you posted, by using the files from the issue I linked, it is currently running.

I just tried to deploy a new one in another namespace but it is not working like before.

[2022-08-23T10:39:25,246][WARN ][o.o.s.OpenSearchSecurityPlugin] [opensearch-masters-0] Directory /usr/share/opensearch/config has insecure file permissions (should be 0700)
[2022-08-23T10:39:25,246][WARN ][o.o.s.OpenSearchSecurityPlugin] [opensearch-masters-0] File /usr/share/opensearch/config/opensearch.yml has insecure file permissions (should be 0600)
[2022-08-23T10:39:25,247][WARN ][o.o.s.OpenSearchSecurityPlugin] [opensearch-masters-0] Directory /usr/share/opensearch/config/tls-transport has insecure file permissions (should be 0700)
[2022-08-23T10:39:25,247][WARN ][o.o.s.OpenSearchSecurityPlugin] [opensearch-masters-0] File /usr/share/opensearch/config/tls-transport/ca.crt has insecure file permissions (should be 0600)
[2022-08-23T10:39:25,247][WARN ][o.o.s.OpenSearchSecurityPlugin] [opensearch-masters-0] File /usr/share/opensearch/config/tls-transport/tls.key has insecure file permissions (should be 0600)
[2022-08-23T10:39:25,248][WARN ][o.o.s.OpenSearchSecurityPlugin] [opensearch-masters-0] File /usr/share/opensearch/config/tls-transport/tls.crt has insecure file permissions (should be 0600)
[2022-08-23T10:39:25,248][WARN ][o.o.s.OpenSearchSecurityPlugin] [opensearch-masters-0] File /usr/share/opensearch/config/tls-transport/..data has insecure file permissions (should be 0600)
[2022-08-23T10:39:25,248][WARN ][o.o.s.OpenSearchSecurityPlugin] [opensearch-masters-0] Directory /usr/share/opensearch/config/tls-transport/..2022_08_23_10_38_13.809426401 has insecure file permissions (should be 0700)
[2022-08-23T10:39:25,248][WARN ][o.o.s.OpenSearchSecurityPlugin] [opensearch-masters-0] File /usr/share/opensearch/config/tls-transport/..2022_08_23_10_38_13.809426401/tls.key has insecure file permissions (should be 0600)
[2022-08-23T10:39:25,249][WARN ][o.o.s.OpenSearchSecurityPlugin] [opensearch-masters-0] File /usr/share/opensearch/config/tls-transport/..2022_08_23_10_38_13.809426401/tls.crt has insecure file permissions (should be 0600)
[2022-08-23T10:39:25,249][WARN ][o.o.s.OpenSearchSecurityPlugin] [opensearch-masters-0] File /usr/share/opensearch/config/tls-transport/..2022_08_23_10_38_13.809426401/ca.crt has insecure file permissions (should be 0600)
[2022-08-23T10:39:25,249][WARN ][o.o.s.OpenSearchSecurityPlugin] [opensearch-masters-0] Directory /usr/share/opensearch/config/tls-http has insecure file permissions (should be 0700)
[2022-08-23T10:39:25,250][WARN ][o.o.s.OpenSearchSecurityPlugin] [opensearch-masters-0] File /usr/share/opensearch/config/tls-http/ca.crt has insecure file permissions (should be 0600)
[2022-08-23T10:39:25,250][WARN ][o.o.s.OpenSearchSecurityPlugin] [opensearch-masters-0] File /usr/share/opensearch/config/tls-http/tls.key has insecure file permissions (should be 0600)
[2022-08-23T10:39:25,250][WARN ][o.o.s.OpenSearchSecurityPlugin] [opensearch-masters-0] File /usr/share/opensearch/config/tls-http/tls.crt has insecure file permissions (should be 0600)
[2022-08-23T10:39:25,250][WARN ][o.o.s.OpenSearchSecurityPlugin] [opensearch-masters-0] File /usr/share/opensearch/config/tls-http/..data has insecure file permissions (should be 0600)
[2022-08-23T10:39:25,251][WARN ][o.o.s.OpenSearchSecurityPlugin] [opensearch-masters-0] Directory /usr/share/opensearch/config/tls-http/..2022_08_23_10_38_13.1216452451 has insecure file permissions (should be 0700)
[2022-08-23T10:39:25,251][WARN ][o.o.s.OpenSearchSecurityPlugin] [opensearch-masters-0] File /usr/share/opensearch/config/tls-http/..2022_08_23_10_38_13.1216452451/tls.crt has insecure file permissions (should be 0600)
[2022-08-23T10:39:25,251][WARN ][o.o.s.OpenSearchSecurityPlugin] [opensearch-masters-0] File /usr/share/opensearch/config/tls-http/..2022_08_23_10_38_13.1216452451/ca.crt has insecure file permissions (should be 0600)
[2022-08-23T10:39:25,251][WARN ][o.o.s.OpenSearchSecurityPlugin] [opensearch-masters-0] File /usr/share/opensearch/config/tls-http/..2022_08_23_10_38_13.1216452451/tls.key has insecure file permissions (should be 0600)

Apart from that, the master node is failing on boostrapping, it keeps on searching for a cluster manager.


[2022-08-23T10:40:13,741][WARN ][o.o.c.c.ClusterFormationFailureHelper] [opensearch-masters-0] cluster-manager not discovered yet, this node has not previously joined a bootstrapped cluster, and this node must discover cluster-manager-eligible nodes [opensearch-bootstrap-0] to bootstrap a cluster: have discovered [{opensearch-masters-0}{tVOjUuXGSMWHy79HBH7RdQ}{WjVnzewGTkC6r7TawNsFmg}{opensearch-masters-0}{10.92.3.19:9300}{dm}{shard_indexing_pressure_enabled=true}]; discovery will continue using [10.92.3.18:9300] from hosts providers and [{opensearch-masters-0}{tVOjUuXGSMWHy79HBH7RdQ}{WjVnzewGTkC6r7TawNsFmg}{opensearch-masters-0}{10.92.3.19:9300}{dm}{shard_indexing_pressure_enabled=true}] from last-known cluster state; node term 0, last-accepted version 0 in term 0

Logs: opensearch-2.2.0-masters-0.log

dickescheid avatar Aug 23 '22 15:08 dickescheid

strangely enough I now get a permissions error now, I have tried with another version 2.1.0 instead of 2.2.0. And also removed pluginsList.

The errors are stating that the permissions are wrong.

[2022-08-23T15:00:40,393][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] Directory /usr/share/opensearch/config has insecure file permissions (should be 0700)
[2022-08-23T15:00:40,393][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] File /usr/share/opensearch/config/opensearch.yml has insecure file permissions (should be 0600)
[2022-08-23T15:00:40,394][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] File /usr/share/opensearch/config/opensearch-security/roles.yml has insecure file permissions (should be 0600)
[2022-08-23T15:00:40,394][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] File /usr/share/opensearch/config/opensearch-security/whitelist.yml has insecure file permissions (should be 0600)
[2022-08-23T15:00:40,394][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] File /usr/share/opensearch/config/opensearch-security/action_groups.yml has insecure file permissions (should be 0600)
[2022-08-23T15:00:40,394][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] File /usr/share/opensearch/config/opensearch-security/internal_users.yml has insecure file permissions (should be 0600)
[2022-08-23T15:00:40,395][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] File /usr/share/opensearch/config/opensearch-security/roles_mapping.yml has insecure file permissions (should be 0600)
[2022-08-23T15:00:40,395][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] File /usr/share/opensearch/config/opensearch-security/config.yml has insecure file permissions (should be 0600)
[2022-08-23T15:00:40,395][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] File /usr/share/opensearch/config/opensearch-security/tenants.yml has insecure file permissions (should be 0600)
[2022-08-23T15:00:40,395][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] File /usr/share/opensearch/config/opensearch-security/nodes_dn.yml has insecure file permissions (should be 0600)
[2022-08-23T15:00:40,396][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] Directory /usr/share/opensearch/config/tls-transport has insecure file permissions (should be 0700)
[2022-08-23T15:00:40,396][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] File /usr/share/opensearch/config/tls-transport/tls.crt has insecure file permissions (should be 0600)
[2022-08-23T15:00:40,396][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] File /usr/share/opensearch/config/tls-transport/ca.crt has insecure file permissions (should be 0600)
[2022-08-23T15:00:40,396][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] File /usr/share/opensearch/config/tls-transport/tls.key has insecure file permissions (should be 0600)
[2022-08-23T15:00:40,397][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] File /usr/share/opensearch/config/tls-transport/..data has insecure file permissions (should be 0600)
[2022-08-23T15:00:40,397][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] Directory /usr/share/opensearch/config/tls-transport/..2022_08_23_14_59_56.2439133386 has insecure file permissions (should be 0700)
[2022-08-23T15:00:40,397][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] File /usr/share/opensearch/config/tls-transport/..2022_08_23_14_59_56.2439133386/tls.crt has insecure file permissions (should be 0600)
[2022-08-23T15:00:40,397][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] File /usr/share/opensearch/config/tls-transport/..2022_08_23_14_59_56.2439133386/ca.crt has insecure file permissions (should be 0600)
[2022-08-23T15:00:40,398][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] File /usr/share/opensearch/config/tls-transport/..2022_08_23_14_59_56.2439133386/tls.key has insecure file permissions (should be 0600)
[2022-08-23T15:00:40,398][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] Directory /usr/share/opensearch/config/tls-http has insecure file permissions (should be 0700)
[2022-08-23T15:00:40,398][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] File /usr/share/opensearch/config/tls-http/tls.key has insecure file permissions (should be 0600)
[2022-08-23T15:00:40,398][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] File /usr/share/opensearch/config/tls-http/tls.crt has insecure file permissions (should be 0600)
[2022-08-23T15:00:40,399][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] File /usr/share/opensearch/config/tls-http/ca.crt has insecure file permissions (should be 0600)
[2022-08-23T15:00:40,399][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] File /usr/share/opensearch/config/tls-http/..data has insecure file permissions (should be 0600)
[2022-08-23T15:00:40,399][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] Directory /usr/share/opensearch/config/tls-http/..2022_08_23_14_59_56.3752168266 has insecure file permissions (should be 0700)
[2022-08-23T15:00:40,400][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] File /usr/share/opensearch/config/tls-http/..2022_08_23_14_59_56.3752168266/ca.crt has insecure file permissions (should be 0600)
[2022-08-23T15:00:40,400][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] File /usr/share/opensearch/config/tls-http/..2022_08_23_14_59_56.3752168266/tls.key has insecure file permissions (should be 0600)
[2022-08-23T15:00:40,400][WARN ][o.o.s.OpenSearchSecurityPlugin] [aicluster-masters-0] File /usr/share/opensearch/config/tls-http/..2022_08_23_14_59_56.3752168266/tls.crt has insecure file permissions (should be 0600)

and then it fails on bootstrapping:

[2022-08-23T15:01:43,307][ERROR][o.o.s.c.ConfigurationLoaderSecurity7] [aicluster-masters-0] Exception while retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
org.opensearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
	at org.opensearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:204) ~[opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:190) ~[opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.action.get.TransportMultiGetAction.doExecute(TransportMultiGetAction.java:81) ~[opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.action.get.TransportMultiGetAction.doExecute(TransportMultiGetAction.java:58) ~[opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:204) [opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.indexmanagement.rollup.actionfilter.FieldCapsFilter.apply(FieldCapsFilter.kt:118) [opensearch-index-management-2.1.0.0.jar:2.1.0.0]
	at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:202) [opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.security.filter.SecurityFilter.apply0(SecurityFilter.java:232) [opensearch-security-2.1.0.0.jar:2.1.0.0]
	at org.opensearch.security.filter.SecurityFilter.apply(SecurityFilter.java:149) [opensearch-security-2.1.0.0.jar:2.1.0.0]
	at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:202) [opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.performanceanalyzer.action.PerformanceAnalyzerActionFilter.apply(PerformanceAnalyzerActionFilter.java:78) [opensearch-performance-analyzer-2.1.0.0.jar:2.1.0.0]
	at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:202) [opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.action.support.TransportAction.execute(TransportAction.java:174) [opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.action.support.TransportAction.execute(TransportAction.java:102) [opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:110) [opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.client.node.NodeClient.doExecute(NodeClient.java:97) [opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:423) [opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.client.support.AbstractClient.multiGet(AbstractClient.java:539) [opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.security.configuration.ConfigurationLoaderSecurity7.loadAsync(ConfigurationLoaderSecurity7.java:207) [opensearch-security-2.1.0.0.jar:2.1.0.0]
	at org.opensearch.security.configuration.ConfigurationLoaderSecurity7.load(ConfigurationLoaderSecurity7.java:98) [opensearch-security-2.1.0.0.jar:2.1.0.0]
	at org.opensearch.security.configuration.ConfigurationRepository.getConfigurationsFromIndex(ConfigurationRepository.java:372) [opensearch-security-2.1.0.0.jar:2.1.0.0]
	at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration0(ConfigurationRepository.java:318) [opensearch-security-2.1.0.0.jar:2.1.0.0]
	at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration(ConfigurationRepository.java:303) [opensearch-security-2.1.0.0.jar:2.1.0.0]
	at org.opensearch.security.configuration.ConfigurationRepository$1.run(ConfigurationRepository.java:163) [opensearch-security-2.1.0.0.jar:2.1.0.0]
	at java.lang.Thread.run(Thread.java:833) [?:?]
[2022-08-23T15:01:47,293][ERROR][o.o.s.a.BackendRegistry  ] [aicluster-masters-0] Not yet initialized (you may need to run securityadmin)
[2022-08-23T15:01:51,319][ERROR][o.o.s.c.ConfigurationLoaderSecurity7] [aicluster-masters-0] Exception while retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security)
org.opensearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
	at org.opensearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:204) ~[opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:190) ~[opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.action.get.TransportMultiGetAction.doExecute(TransportMultiGetAction.java:81) ~[opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.action.get.TransportMultiGetAction.doExecute(TransportMultiGetAction.java:58) ~[opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:204) [opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.indexmanagement.rollup.actionfilter.FieldCapsFilter.apply(FieldCapsFilter.kt:118) [opensearch-index-management-2.1.0.0.jar:2.1.0.0]
	at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:202) [opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.security.filter.SecurityFilter.apply0(SecurityFilter.java:232) [opensearch-security-2.1.0.0.jar:2.1.0.0]
	at org.opensearch.security.filter.SecurityFilter.apply(SecurityFilter.java:149) [opensearch-security-2.1.0.0.jar:2.1.0.0]
	at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:202) [opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.performanceanalyzer.action.PerformanceAnalyzerActionFilter.apply(PerformanceAnalyzerActionFilter.java:78) [opensearch-performance-analyzer-2.1.0.0.jar:2.1.0.0]
	at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:202) [opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.action.support.TransportAction.execute(TransportAction.java:174) [opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.action.support.TransportAction.execute(TransportAction.java:102) [opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:110) [opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.client.node.NodeClient.doExecute(NodeClient.java:97) [opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:423) [opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.client.support.AbstractClient.multiGet(AbstractClient.java:539) [opensearch-2.1.0.jar:2.1.0]
	at org.opensearch.security.configuration.ConfigurationLoaderSecurity7.loadAsync(ConfigurationLoaderSecurity7.java:207) [opensearch-security-2.1.0.0.jar:2.1.0.0]
	at org.opensearch.security.configuration.ConfigurationLoaderSecurity7.load(ConfigurationLoaderSecurity7.java:98) [opensearch-security-2.1.0.0.jar:2.1.0.0]
	at org.opensearch.security.configuration.ConfigurationRepository.getConfigurationsFromIndex(ConfigurationRepository.java:372) [opensearch-security-2.1.0.0.jar:2.1.0.0]
	at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration0(ConfigurationRepository.java:318) [opensearch-security-2.1.0.0.jar:2.1.0.0]
	at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration(ConfigurationRepository.java:303) [opensearch-security-2.1.0.0.jar:2.1.0.0]
	at org.opensearch.security.configuration.ConfigurationRepository$1.run(ConfigurationRepository.java:163) [opensearch-security-2.1.0.0.jar:2.1.0.0]
	at java.lang.Thread.run(Thread.java:833) [?:?]

opensearch-aicluster-2.1.0-masters-0.log

dickescheid avatar Aug 23 '22 15:08 dickescheid

I sometimes see these issues also in the logs, I have tried so many combinations of settings and versions I am not certain what causes it.

So you got the manifests from my comment above up and running on cluster? I'm running on IKS from IBM cloud, it is CNCF certified but I guess you are running on another environment?

I am going to try a stripped down deployment on minikube cluster.

dobharweim avatar Aug 23 '22 15:08 dobharweim

Which version operator are you running?

dobharweim avatar Aug 23 '22 15:08 dobharweim

On minikube with operator version 2.0.4 and opensearch 2.2.0 and the following manifest:

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: my-cluster
  namespace: default
spec:
  security:
    config:
      securityConfigSecret:
        name: securityconfig-secret
      adminCredentialsSecret:
        name: admin-credentials-secret
  general:
    # additionalConfig:
    #   cluster.initial_master_nodes: 'my-cluster-masters-0'
    #   discovery.seed_hosts: 'opensearch-cluster-master-headless'
      # network.host: 0.0.0.0
      # network.bind_host: 0.0.0.0
    version: 2.2.0
    httpPort: 9200
    vendor: opensearch
    serviceName: my-cluster
    pluginsList: ["repository-s3"," https://github.com/aiven/prometheus-exporter-plugin-for-opensearch/releases/download/2.2.0.0/prometheus-exporter-2.2.0.0.zip"]
  dashboards:
    opensearchCredentialsSecret:
      name: admin-credentials-secret
    version: 2.2.0
    enable: true
    replicas: 0
    resources:
      requests:
        memory: '1Gi'
        cpu: '500m'
      limits:
        memory: '1Gi'
        cpu: '500m'
  confMgmt:
    smartScaler: true
  nodePools:
    - component: masters
      replicas: 1
      diskSize: '30Gi'
      NodeSelector:
      resources:
        requests:
          memory: '2Gi'
          cpu: '500m'
        limits:
          memory: '2Gi'
          cpu: '500m'
      roles:
        - 'master'
        - 'data'
      persistence:
        emptyDir: {}
    - component: nodes
      replicas: 1
      diskSize: '30Gi'
      NodeSelector:
      resources:
        requests:
          memory: '2Gi'
          cpu: '500m'
        limits:
          memory: '2Gi'
          cpu: '500m'
      roles:
        - 'data'
      persistence:
        emptyDir: {}
    - component: coordinators
      replicas: 0
      diskSize: '30Gi'
      NodeSelector:
      resources:
        requests:
          memory: '2Gi'
          cpu: '500m'
        limits:
          memory: '2Gi'
          cpu: '500m'
      roles:
        - 'ingest'
      persistence:
        emptyDir: {}

I am getting the bootstrapped errors:

[2022-08-23T15:56:35,430][ERROR][o.o.s.a.BackendRegistry ] [my-cluster-masters-0] Not yet initialized (you may need to run securityadmin) [2022-08-23T15:56:38,603][WARN ][o.o.c.c.ClusterFormationFailureHelper] [my-cluster-masters-0] cluster-manager not discovered yet, this node has not previously joined a bootstrapped cluster, and this node must discover cluster-manager-eligible nodes [my-cluster-bootstrap-0] to bootstrap a cluster: have discovered [{my-cluster-masters-0}{Xfi58xuxTlu9BaF_yz5xNA}{Zmo6jTwvQO-h_rTi7NZWAQ}{my-cluster-masters-0}{172.17.0.4:9300}{dm}{shard_indexing_pressure_enabled=true}]; discovery will continue using [172.17.0.5:9300] from hosts providers and [{my-cluster-masters-0}{Xfi58xuxTlu9BaF_yz5xNA}{Zmo6jTwvQO-h_rTi7NZWAQ}{my-cluster-masters-0}{172.17.0.4:9300}{dm}{shard_indexing_pressure_enabled=true}] from last-known cluster state; node term 0, last-accepted version 0 in term 0 [2022-08-23T15:56:44,690][ERROR][o.o.s.c.ConfigurationLoaderSecurity7] [my-cluster-masters-0] Exception while retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security) org.opensearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized]; at org.opensearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:204) ~[opensearch-2.2.0.jar:2.2.0] at org.opensearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:190) ~[opensearch-2.2.0.jar:2.2.0] at org.opensearch.action.get.TransportMultiGetAction.doExecute(TransportMultiGetAction.java:81) ~[opensearch-2.2.0.jar:2.2.0] at org.opensearch.action.get.TransportMultiGetAction.doExecute(TransportMultiGetAction.java:58) ~[opensearch-2.2.0.jar:2.2.0] at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:218) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.indexmanagement.rollup.actionfilter.FieldCapsFilter.apply(FieldCapsFilter.kt:118) [opensearch-index-management-2.2.0.0.jar:2.2.0.0] at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.performanceanalyzer.action.PerformanceAnalyzerActionFilter.apply(PerformanceAnalyzerActionFilter.java:78) [opensearch-performance-analyzer-2.2.0.0.jar:2.2.0.0] at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.security.filter.SecurityFilter.apply0(SecurityFilter.java:232) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.security.filter.SecurityFilter.apply(SecurityFilter.java:149) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.action.support.TransportAction.execute(TransportAction.java:188) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.action.support.TransportAction.execute(TransportAction.java:107) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:110) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.client.node.NodeClient.doExecute(NodeClient.java:97) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:423) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.client.support.AbstractClient.multiGet(AbstractClient.java:539) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.security.configuration.ConfigurationLoaderSecurity7.loadAsync(ConfigurationLoaderSecurity7.java:207) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.security.configuration.ConfigurationLoaderSecurity7.load(ConfigurationLoaderSecurity7.java:98) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.security.configuration.ConfigurationRepository.getConfigurationsFromIndex(ConfigurationRepository.java:372) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration0(ConfigurationRepository.java:318) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration(ConfigurationRepository.java:303) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.security.configuration.ConfigurationRepository$1.run(ConfigurationRepository.java:163) [opensearch-security-2.2.0.0.jar:2.2.0.0] at java.lang.Thread.run(Thread.java:833) [?:?] [2022-08-23T15:56:48,605][WARN ][o.o.c.c.ClusterFormationFailureHelper] [my-cluster-masters-0] cluster-manager not discovered yet, this node has not previously joined a bootstrapped cluster, and this node must discover cluster-manager-eligible nodes [my-cluster-bootstrap-0] to bootstrap a cluster: have discovered [{my-cluster-masters-0}{Xfi58xuxTlu9BaF_yz5xNA}{Zmo6jTwvQO-h_rTi7NZWAQ}{my-cluster-masters-0}{172.17.0.4:9300}{dm}{shard_indexing_pressure_enabled=true}]; discovery will continue using [172.17.0.5:9300] from hosts providers and [{my-cluster-masters-0}{Xfi58xuxTlu9BaF_yz5xNA}{Zmo6jTwvQO-h_rTi7NZWAQ}{my-cluster-masters-0}{172.17.0.4:9300}{dm}{shard_indexing_pressure_enabled=true}] from last-known cluster state; node term 0, last-accepted version 0 in term 0 [2022-08-23T15:56:57,671][ERROR][o.o.s.c.ConfigurationLoaderSecurity7] [my-cluster-masters-0] Exception while retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security) org.opensearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized]; at org.opensearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:204) ~[opensearch-2.2.0.jar:2.2.0] at org.opensearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:190) ~[opensearch-2.2.0.jar:2.2.0] at org.opensearch.action.get.TransportMultiGetAction.doExecute(TransportMultiGetAction.java:81) ~[opensearch-2.2.0.jar:2.2.0] at org.opensearch.action.get.TransportMultiGetAction.doExecute(TransportMultiGetAction.java:58) ~[opensearch-2.2.0.jar:2.2.0] at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:218) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.indexmanagement.rollup.actionfilter.FieldCapsFilter.apply(FieldCapsFilter.kt:118) [opensearch-index-management-2.2.0.0.jar:2.2.0.0] at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.performanceanalyzer.action.PerformanceAnalyzerActionFilter.apply(PerformanceAnalyzerActionFilter.java:78) [opensearch-performance-analyzer-2.2.0.0.jar:2.2.0.0] at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.security.filter.SecurityFilter.apply0(SecurityFilter.java:232) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.security.filter.SecurityFilter.apply(SecurityFilter.java:149) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.action.support.TransportAction.execute(TransportAction.java:188) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.action.support.TransportAction.execute(TransportAction.java:107) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:110) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.client.node.NodeClient.doExecute(NodeClient.java:97) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:423) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.client.support.AbstractClient.multiGet(AbstractClient.java:539) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.security.configuration.ConfigurationLoaderSecurity7.loadAsync(ConfigurationLoaderSecurity7.java:207) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.security.configuration.ConfigurationLoaderSecurity7.load(ConfigurationLoaderSecurity7.java:98) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.security.configuration.ConfigurationRepository.getConfigurationsFromIndex(ConfigurationRepository.java:372) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration0(ConfigurationRepository.java:318) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration(ConfigurationRepository.java:303) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.security.configuration.ConfigurationRepository$1.run(ConfigurationRepository.java:163) [opensearch-security-2.2.0.0.jar:2.2.0.0] at java.lang.Thread.run(Thread.java:833) [?:?] [2022-08-23T15:56:58,585][WARN ][o.o.c.c.ClusterFormationFailureHelper] [my-cluster-masters-0] cluster-manager not discovered yet, this node has not previously joined a bootstrapped cluster, and this node must discover cluster-manager-eligible nodes [my-cluster-bootstrap-0] to bootstrap a cluster: have discovered [{my-cluster-masters-0}{Xfi58xuxTlu9BaF_yz5xNA}{Zmo6jTwvQO-h_rTi7NZWAQ}{my-cluster-masters-0}{172.17.0.4:9300}{dm}{shard_indexing_pressure_enabled=true}]; discovery will continue using [172.17.0.5:9300] from hosts providers and [{my-cluster-masters-0}{Xfi58xuxTlu9BaF_yz5xNA}{Zmo6jTwvQO-h_rTi7NZWAQ}{my-cluster-masters-0}{172.17.0.4:9300}{dm}{shard_indexing_pressure_enabled=true}] from last-known cluster state; node term 0, last-accepted version 0 in term 0 [2022-08-23T15:57:05,415][ERROR][o.o.s.a.BackendRegistry ] [my-cluster-masters-0] Not yet initialized (you may need to run securityadmin) [2022-08-23T15:57:08,586][WARN ][o.o.c.c.ClusterFormationFailureHelper] [my-cluster-masters-0] cluster-manager not discovered yet, this node has not previously joined a bootstrapped cluster, and this node must discover cluster-manager-eligible nodes [my-cluster-bootstrap-0] to bootstrap a cluster: have discovered [{my-cluster-masters-0}{Xfi58xuxTlu9BaF_yz5xNA}{Zmo6jTwvQO-h_rTi7NZWAQ}{my-cluster-masters-0}{172.17.0.4:9300}{dm}{shard_indexing_pressure_enabled=true}]; discovery will continue using [172.17.0.5:9300] from hosts providers and [{my-cluster-masters-0}{Xfi58xuxTlu9BaF_yz5xNA}{Zmo6jTwvQO-h_rTi7NZWAQ}{my-cluster-masters-0}{172.17.0.4:9300}{dm}{shard_indexing_pressure_enabled=true}] from last-known cluster state; node term 0, last-accepted version 0 in term 0 [2022-08-23T15:57:10,673][ERROR][o.o.s.c.ConfigurationLoaderSecurity7] [my-cluster-masters-0] Exception while retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, ALLOWLIST, AUDIT] (index=.opendistro_security) org.opensearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized]; at org.opensearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:204) ~[opensearch-2.2.0.jar:2.2.0] at org.opensearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:190) ~[opensearch-2.2.0.jar:2.2.0] at org.opensearch.action.get.TransportMultiGetAction.doExecute(TransportMultiGetAction.java:81) ~[opensearch-2.2.0.jar:2.2.0] at org.opensearch.action.get.TransportMultiGetAction.doExecute(TransportMultiGetAction.java:58) ~[opensearch-2.2.0.jar:2.2.0] at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:218) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.indexmanagement.rollup.actionfilter.FieldCapsFilter.apply(FieldCapsFilter.kt:118) [opensearch-index-management-2.2.0.0.jar:2.2.0.0] at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.performanceanalyzer.action.PerformanceAnalyzerActionFilter.apply(PerformanceAnalyzerActionFilter.java:78) [opensearch-performance-analyzer-2.2.0.0.jar:2.2.0.0] at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.security.filter.SecurityFilter.apply0(SecurityFilter.java:232) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.security.filter.SecurityFilter.apply(SecurityFilter.java:149) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.action.support.TransportAction.execute(TransportAction.java:188) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.action.support.TransportAction.execute(TransportAction.java:107) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:110) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.client.node.NodeClient.doExecute(NodeClient.java:97) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:423) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.client.support.AbstractClient.multiGet(AbstractClient.java:539) [opensearch-2.2.0.jar:2.2.0] at org.opensearch.security.configuration.ConfigurationLoaderSecurity7.loadAsync(ConfigurationLoaderSecurity7.java:207) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.security.configuration.ConfigurationLoaderSecurity7.load(ConfigurationLoaderSecurity7.java:98) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.security.configuration.ConfigurationRepository.getConfigurationsFromIndex(ConfigurationRepository.java:372) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration0(ConfigurationRepository.java:318) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration(ConfigurationRepository.java:303) [opensearch-security-2.2.0.0.jar:2.2.0.0] at org.opensearch.security.configuration.ConfigurationRepository$1.run(ConfigurationRepository.java:163) [opensearch-security-2.2.0.0.jar:2.2.0.0] at java.lang.Thread.run(Thread.java:833) [?:?] my-cluster-masters-0.log

dobharweim avatar Aug 23 '22 15:08 dobharweim

I'm running opensearch-operator 2.0.4 tries removing the operator and applying it again as well. I destroyed the properly running cluster and tried recreating that, it also failed. Currently I am only getting the bootstrapped issues.

The only thing which changed between the time the opensearch cluster was working and now is the kubernetes cluster version. I updated from version of v1.22 to currently 1.23.7-gke.1400 in the last days.

My kubernetes cluster is GKE on Google.

dickescheid avatar Aug 23 '22 16:08 dickescheid

OK I am trying with the same yaml from the above comment with new minikube running 1.22.0. Glad to know it is not just me seeing this issue - I wonder if the example manifest is just quite out of date and missing configurations required with the later versions of the operator.

dobharweim avatar Aug 23 '22 16:08 dobharweim

Still getting the following:

[2022-08-23T16:18:28,406][INFO ][o.o.s.c.ConfigurationRepository] [my-cluster-masters-0] Will attempt to create index .opendistro_security and default configs if they are absent
[2022-08-23T16:18:28,407][INFO ][o.o.s.c.ConfigurationRepository] [my-cluster-masters-0] Background init thread started. Install default config?: true
[2022-08-23T16:18:28,407][INFO ][o.o.s.OpenSearchSecurityPlugin] [my-cluster-masters-0] 0 OpenSearch Security modules loaded so far: []
[2022-08-23T16:18:38,311][WARN ][o.o.c.c.ClusterFormationFailureHelper] [my-cluster-masters-0] cluster-manager not discovered yet, this node has not previously joined a bootstrapped cluster, and this node must discover cluster-manager-eligible nodes [my-cluster-bootstrap-0] to bootstrap a cluster: have discovered [{my-cluster-masters-0}{c5qpmY3yQXq7kj3FRVRfyw}{1eX_y2wDRMywHoa3JdntGw}{my-cluster-masters-0}{172.17.0.4:9300}{dm}{shard_indexing_pressure_enabled=true}]; discovery will continue using [172.17.0.5:9300] from hosts providers and [{my-cluster-masters-0}{c5qpmY3yQXq7kj3FRVRfyw}{1eX_y2wDRMywHoa3JdntGw}{my-cluster-masters-0}{172.17.0.4:9300}{dm}{shard_indexing_pressure_enabled=true}] from last-known cluster state; node term 0, last-accepted version 0 in term 0
[2022-08-23T16:18:38,610][ERROR][o.o.s.a.BackendRegistry  ] [my-cluster-masters-0] Not yet initialized (you may need to run securityadmin)
[2022-08-23T16:18:48,314][WARN ][o.o.c.c.ClusterFormationFailureHelper] [my-cluster-masters-0] cluster-manager not discovered yet, this node has not previously joined a bootstrapped cluster, and this node must discover cluster-manager-eligible nodes [my-cluster-bootstrap-0] to bootstrap a cluster: have discovered [{my-cluster-masters-0}{c5qpmY3yQXq7kj3FRVRfyw}{1eX_y2wDRMywHoa3JdntGw}{my-cluster-masters-0}{172.17.0.4:9300}{dm}{shard_indexing_pressure_enabled=true}]; discovery will continue using [172.17.0.5:9300] from hosts providers and [{my-cluster-masters-0}{c5qpmY3yQXq7kj3FRVRfyw}{1eX_y2wDRMywHoa3JdntGw}{my-cluster-masters-0}{172.17.0.4:9300}{dm}{shard_indexing_pressure_enabled=true}] from last-known cluster state; node term 0, last-accepted version 0 in term 0
[2022-08-23T16:18:48,498][ERROR][o.o.s.a.BackendRegistry  ] [my-cluster-masters-0] Not yet initialized (you may need to run securityadmin)```

dobharweim avatar Aug 23 '22 16:08 dobharweim

With the following versions:

Operator 2.0.4 OS: 2.2.0 K8s: 1.22.0

and the following yaml:

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: my-cluster
  namespace: default
spec:
  # security:
  #   config:
  #     securityConfigSecret:
  #       name: securityconfig-secret
  #     adminCredentialsSecret:
  #       name: admin-credentials-secret
  general:
    additionalConfig:
      cluster.initial_master_nodes: 'my-cluster-masters-0'
      # discovery.seed_hosts: 'my-cluster-masters-0'
      network.host: 0.0.0.0
      network.bind_host: 0.0.0.0
    version: 2.2.0
    httpPort: 9200
    vendor: opensearch
    serviceName: my-cluster
    # pluginsList: ["repository-s3"," https://github.com/aiven/prometheus-exporter-plugin-for-opensearch/releases/download/2.2.0.0/prometheus-exporter-2.2.0.0.zip"]
  dashboards:
    # opensearchCredentialsSecret:
    #   name: admin-credentials-secret
    version: 2.2.0
    enable: true
    replicas: 0
    resources:
      requests:
        memory: '1Gi'
        cpu: '500m'
      limits:
        memory: '1Gi'
        cpu: '500m'
  confMgmt:
    smartScaler: true
  nodePools:
    - component: masters
      replicas: 1
      diskSize: '30Gi'
      NodeSelector:
      resources:
        requests:
          memory: '2Gi'
          cpu: '500m'
        limits:
          memory: '2Gi'
          cpu: '500m'
      roles:
        - 'master'
        - 'data'
      persistence:
        emptyDir: {}
    - component: nodes
      replicas: 1
      diskSize: '30Gi'
      NodeSelector:
      resources:
        requests:
          memory: '2Gi'
          cpu: '500m'
        limits:
          memory: '2Gi'
          cpu: '500m'
      roles:
        - 'data'
      persistence:
        emptyDir: {}
    - component: coordinators
      replicas: 0
      diskSize: '30Gi'
      NodeSelector:
      resources:
        requests:
          memory: '2Gi'
          cpu: '500m'
        limits:
          memory: '2Gi'
          cpu: '500m'
      roles:
        - 'ingest'
      persistence:
        emptyDir: {}

The master and data nodes are coming up. But if I enable the security config settings with the secretes I see that the readiness fail.

[2022-08-23T16:26:14,284][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [my-cluster-masters-0] Detected cluster change event for destination migration [2022-08-23T16:26:14,682][INFO ][o.o.s.s.ConfigHelper ] [my-cluster-masters-0] Doc with id 'config' and version 2 is updated in .opendistro_security index. [2022-08-23T16:26:14,682][INFO ][o.o.s.s.ConfigHelper ] [my-cluster-masters-0] Will update 'roles' with /usr/share/opensearch/config/opensearch-security/roles.yml and populate it with empty doc if file missing and populateEmptyIfFileMissing=false [2022-08-23T16:26:14,782][INFO ][o.o.c.m.MetadataMappingService] [my-cluster-masters-0] [.opendistro_security/Mk-Xr8o9QFyXtriFOklxjg] update_mapping [_doc] [2022-08-23T16:26:14,973][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [my-cluster-masters-0] Detected cluster change event for destination migration [2022-08-23T16:26:14,994][INFO ][o.o.s.s.ConfigHelper ] [my-cluster-masters-0] Doc with id 'roles' and version 2 is updated in .opendistro_security index. [2022-08-23T16:26:14,995][INFO ][o.o.s.s.ConfigHelper ] [my-cluster-masters-0] Will update 'rolesmapping' with /usr/share/opensearch/config/opensearch-security/roles_mapping.yml and populate it with empty doc if file missing and populateEmptyIfFileMissing=false [2022-08-23T16:26:15,171][INFO ][o.o.c.m.MetadataMappingService] [my-cluster-masters-0] [.opendistro_security/Mk-Xr8o9QFyXtriFOklxjg] update_mapping [_doc] [2022-08-23T16:26:15,272][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [my-cluster-masters-0] Detected cluster change event for destination migration [2022-08-23T16:26:15,290][INFO ][o.o.s.s.ConfigHelper ] [my-cluster-masters-0] Doc with id 'rolesmapping' and version 2 is updated in .opendistro_security index. [2022-08-23T16:26:15,291][INFO ][o.o.s.s.ConfigHelper ] [my-cluster-masters-0] Will update 'internalusers' with /usr/share/opensearch/config/opensearch-security/internal_users.yml and populate it with empty doc if file missing and populateEmptyIfFileMissing=false [2022-08-23T16:26:15,387][INFO ][o.o.c.m.MetadataMappingService] [my-cluster-masters-0] [.opendistro_security/Mk-Xr8o9QFyXtriFOklxjg] update_mapping [_doc] [2022-08-23T16:26:15,492][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [my-cluster-masters-0] Detected cluster change event for destination migration [2022-08-23T16:26:15,579][INFO ][o.o.s.s.ConfigHelper ] [my-cluster-masters-0] Doc with id 'internalusers' and version 2 is updated in .opendistro_security index. [2022-08-23T16:26:15,580][INFO ][o.o.s.s.ConfigHelper ] [my-cluster-masters-0] Will update 'actiongroups' with /usr/share/opensearch/config/opensearch-security/action_groups.yml and populate it with empty doc if file missing and populateEmptyIfFileMissing=false [2022-08-23T16:26:15,672][INFO ][o.o.c.m.MetadataMappingService] [my-cluster-masters-0] [.opendistro_security/Mk-Xr8o9QFyXtriFOklxjg] update_mapping [_doc] [2022-08-23T16:26:15,773][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [my-cluster-masters-0] Detected cluster change event for destination migration [2022-08-23T16:26:15,876][INFO ][o.o.s.s.ConfigHelper ] [my-cluster-masters-0] Doc with id 'actiongroups' and version 2 is updated in .opendistro_security index. [2022-08-23T16:26:15,878][INFO ][o.o.s.s.ConfigHelper ] [my-cluster-masters-0] Will update 'tenants' with /usr/share/opensearch/config/opensearch-security/tenants.yml and populate it with empty doc if file missing and populateEmptyIfFileMissing=false [2022-08-23T16:26:15,975][INFO ][o.o.c.m.MetadataMappingService] [my-cluster-masters-0] [.opendistro_security/Mk-Xr8o9QFyXtriFOklxjg] update_mapping [_doc] [2022-08-23T16:26:16,089][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [my-cluster-masters-0] Detected cluster change event for destination migration [2022-08-23T16:26:16,107][INFO ][o.o.s.s.ConfigHelper ] [my-cluster-masters-0] Doc with id 'tenants' and version 2 is updated in .opendistro_security index. [2022-08-23T16:26:16,173][INFO ][o.o.s.s.ConfigHelper ] [my-cluster-masters-0] Will update 'nodesdn' with /usr/share/opensearch/config/opensearch-security/nodes_dn.yml and populate it with empty doc if file missing and populateEmptyIfFileMissing=true [2022-08-23T16:26:16,277][INFO ][o.o.c.m.MetadataMappingService] [my-cluster-masters-0] [.opendistro_security/Mk-Xr8o9QFyXtriFOklxjg] update_mapping [_doc] [2022-08-23T16:26:16,474][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [my-cluster-masters-0] Detected cluster change event for destination migration [2022-08-23T16:26:16,579][INFO ][o.o.s.s.ConfigHelper ] [my-cluster-masters-0] Doc with id 'nodesdn' and version 2 is updated in .opendistro_security index. [2022-08-23T16:26:16,579][INFO ][o.o.s.s.ConfigHelper ] [my-cluster-masters-0] Will update 'whitelist' with /usr/share/opensearch/config/opensearch-security/whitelist.yml and populate it with empty doc if file missing and populateEmptyIfFileMissing=true [2022-08-23T16:26:16,678][INFO ][o.o.c.m.MetadataMappingService] [my-cluster-masters-0] [.opendistro_security/Mk-Xr8o9QFyXtriFOklxjg] update_mapping [_doc] [2022-08-23T16:26:16,875][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [my-cluster-masters-0] Detected cluster change event for destination migration [2022-08-23T16:26:17,103][INFO ][o.o.s.s.ConfigHelper ] [my-cluster-masters-0] Doc with id 'whitelist' and version 2 is updated in .opendistro_security index. [2022-08-23T16:26:17,104][INFO ][o.o.s.s.ConfigHelper ] [my-cluster-masters-0] Will update 'allowlist' with /usr/share/opensearch/config/opensearch-security/allowlist.yml and populate it with empty doc if file missing and populateEmptyIfFileMissing=true [2022-08-23T16:26:17,178][INFO ][o.o.c.m.MetadataMappingService] [my-cluster-masters-0] [.opendistro_security/Mk-Xr8o9QFyXtriFOklxjg] update_mapping [_doc] [2022-08-23T16:26:17,290][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [my-cluster-masters-0] Detected cluster change event for destination migration [2022-08-23T16:26:17,384][INFO ][o.o.s.s.ConfigHelper ] [my-cluster-masters-0] Doc with id 'allowlist' and version 2 is updated in .opendistro_security index. [2022-08-23T16:26:17,385][INFO ][o.o.s.s.ConfigHelper ] [my-cluster-masters-0] Will update 'audit' with /usr/share/opensearch/config/opensearch-security/audit.yml and populate it with empty doc if file missing and populateEmptyIfFileMissing=false [2022-08-23T16:26:17,574][INFO ][o.o.c.m.MetadataMappingService] [my-cluster-masters-0] [.opendistro_security/Mk-Xr8o9QFyXtriFOklxjg] update_mapping [_doc] [2022-08-23T16:26:17,678][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [my-cluster-masters-0] Detected cluster change event for destination migration [2022-08-23T16:26:17,778][INFO ][o.o.s.s.ConfigHelper ] [my-cluster-masters-0] Doc with id 'audit' and version 2 is updated in .opendistro_security index. [2022-08-23T16:26:18,577][INFO ][stdout ] [my-cluster-masters-0] [FINE] No subscribers registered for event class org.opensearch.security.securityconf.DynamicConfigFactory$NodesDnModelImpl [2022-08-23T16:26:18,578][INFO ][stdout ] [my-cluster-masters-0] [FINE] No subscribers registered for event class org.greenrobot.eventbus.NoSubscriberEvent [2022-08-23T16:26:18,578][INFO ][o.o.s.a.i.AuditLogImpl ] [my-cluster-masters-0] Auditing on REST API is enabled. [2022-08-23T16:26:18,579][INFO ][o.o.s.a.i.AuditLogImpl ] [my-cluster-masters-0] [AUTHENTICATED, GRANTED_PRIVILEGES] are excluded from REST API auditing. [2022-08-23T16:26:18,579][INFO ][o.o.s.a.i.AuditLogImpl ] [my-cluster-masters-0] Auditing on Transport API is enabled. [2022-08-23T16:26:18,579][INFO ][o.o.s.a.i.AuditLogImpl ] [my-cluster-masters-0] [AUTHENTICATED, GRANTED_PRIVILEGES] are excluded from Transport API auditing. [2022-08-23T16:26:18,579][INFO ][o.o.s.a.i.AuditLogImpl ] [my-cluster-masters-0] Auditing of request body is enabled. [2022-08-23T16:26:18,580][INFO ][o.o.s.a.i.AuditLogImpl ] [my-cluster-masters-0] Bulk requests resolution is disabled during request auditing. [2022-08-23T16:26:18,580][INFO ][o.o.s.a.i.AuditLogImpl ] [my-cluster-masters-0] Index resolution is enabled during request auditing. [2022-08-23T16:26:18,580][INFO ][o.o.s.a.i.AuditLogImpl ] [my-cluster-masters-0] Sensitive headers auditing is enabled. [2022-08-23T16:26:18,582][INFO ][o.o.s.a.i.AuditLogImpl ] [my-cluster-masters-0] Auditing requests from kibanaserver users is disabled. [2022-08-23T16:26:18,592][WARN ][o.o.s.a.r.AuditMessageRouter] [my-cluster-masters-0] No endpoint configured for categories [BAD_HEADERS, FAILED_LOGIN, MISSING_PRIVILEGES, GRANTED_PRIVILEGES, OPENDISTRO_SECURITY_INDEX_ATTEMPT, SSL_EXCEPTION, AUTHENTICATED, INDEX_EVENT, COMPLIANCE_DOC_READ, COMPLIANCE_DOC_WRITE, COMPLIANCE_EXTERNAL_CONFIG, COMPLIANCE_INTERNAL_CONFIG_READ, COMPLIANCE_INTERNAL_CONFIG_WRITE], using default endpoint [2022-08-23T16:26:18,592][INFO ][o.o.s.a.i.AuditLogImpl ] [my-cluster-masters-0] Auditing of external configuration is disabled. [2022-08-23T16:26:18,593][INFO ][o.o.s.a.i.AuditLogImpl ] [my-cluster-masters-0] Auditing of internal configuration is enabled. [2022-08-23T16:26:18,593][INFO ][o.o.s.a.i.AuditLogImpl ] [my-cluster-masters-0] Auditing only metadata information for read request is enabled. [2022-08-23T16:26:18,594][INFO ][o.o.s.a.i.AuditLogImpl ] [my-cluster-masters-0] Auditing will watch {} for read requests. [2022-08-23T16:26:18,594][INFO ][o.o.s.a.i.AuditLogImpl ] [my-cluster-masters-0] Auditing read operation requests from kibanaserver users is disabled. [2022-08-23T16:26:18,594][INFO ][o.o.s.a.i.AuditLogImpl ] [my-cluster-masters-0] Auditing only metadata information for write request is enabled. [2022-08-23T16:26:18,594][INFO ][o.o.s.a.i.AuditLogImpl ] [my-cluster-masters-0] Auditing diffs for write requests is disabled. [2022-08-23T16:26:18,594][INFO ][o.o.s.a.i.AuditLogImpl ] [my-cluster-masters-0] Auditing write operation requests from kibanaserver users is disabled. [2022-08-23T16:26:18,594][INFO ][o.o.s.a.i.AuditLogImpl ] [my-cluster-masters-0] Auditing will watch <NONE> for write requests. [2022-08-23T16:26:18,594][INFO ][o.o.s.a.i.AuditLogImpl ] [my-cluster-masters-0] .opendistro_security is used as internal security index. [2022-08-23T16:26:18,594][INFO ][o.o.s.a.i.AuditLogImpl ] [my-cluster-masters-0] Internal index used for posting audit logs is null [2022-08-23T16:26:18,599][INFO ][o.o.s.c.ConfigurationRepository] [my-cluster-masters-0] Hot-reloading of audit configuration is enabled [2022-08-23T16:26:18,600][INFO ][o.o.s.c.ConfigurationRepository] [my-cluster-masters-0] Node 'my-cluster-masters-0' initialized [2022-08-23T16:26:23,174][WARN ][o.o.s.a.BackendRegistry ] [my-cluster-masters-0] Authentication finally failed for admin from 127.0.0.1:55184 [2022-08-23T16:26:23,200][INFO ][o.o.c.m.MetadataCreateIndexService] [my-cluster-masters-0] [security-auditlog-2022.08.23] creating index, cause [auto(bulk api)], templates [], shards [1]/[1] [2022-08-23T16:26:23,402][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [my-cluster-masters-0] Detected cluster change event for destination migration [2022-08-23T16:26:23,521][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [my-cluster-masters-0] Detected cluster change event for destination migration [2022-08-23T16:26:23,565][INFO ][o.o.c.m.MetadataMappingService] [my-cluster-masters-0] [security-auditlog-2022.08.23/CaMU9LNsSqmaosC0BxSO6Q] create_mapping [2022-08-23T16:26:23,692][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [my-cluster-masters-0] Detected cluster change event for destination migration [2022-08-23T16:26:52,978][WARN ][o.o.s.a.BackendRegistry ] [my-cluster-masters-0] Authentication finally failed for admin from 127.0.0.1:55346 [2022-08-23T16:27:10,560][INFO ][o.o.i.i.ManagedIndexCoordinator] [my-cluster-masters-0] Performing move cluster state metadata. [2022-08-23T16:27:10,560][INFO ][o.o.i.i.MetadataService ] [my-cluster-masters-0] ISM config index not exist, so we cancel the metadata migration job. [2022-08-23T16:27:10,561][INFO ][o.o.i.i.ManagedIndexCoordinator] [my-cluster-masters-0] Performing ISM template migration. [2022-08-23T16:27:10,562][INFO ][o.o.i.m.ISMTemplateService] [my-cluster-masters-0] Doing ISM template migration 1 time. [2022-08-23T16:27:10,563][INFO ][o.o.i.m.ISMTemplateService] [my-cluster-masters-0] Use 2022-08-23T15:26:10.600Z as migrating ISM template last_updated_time [2022-08-23T16:27:10,564][INFO ][o.o.i.m.ISMTemplateService] [my-cluster-masters-0] ISM templates: {} [2022-08-23T16:27:10,565][INFO ][o.o.i.m.ISMTemplateService] [my-cluster-masters-0] Policies to update: [] [2022-08-23T16:27:10,569][INFO ][o.o.i.m.ISMTemplateService] [my-cluster-masters-0] Failure experienced when migrating ISM Template and update ISM policies: {} [2022-08-23T16:27:10,733][INFO ][o.o.c.s.ClusterSettings ] [my-cluster-masters-0] updating [plugins.index_state_management.template_migration.control] from [0] to [-1] [2022-08-23T16:27:10,734][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [my-cluster-masters-0] Detected cluster change event for destination migration [2022-08-23T16:27:10,739][INFO ][o.o.i.m.ISMTemplateService] [my-cluster-masters-0] Successfully update template migration setting [2022-08-23T16:27:22,757][WARN ][o.o.s.a.BackendRegistry ] [my-cluster-masters-0] Authentication finally failed for admin from 127.0.0.1:55514 [2022-08-23T16:27:33,157][WARN ][o.o.s.a.BackendRegistry ] [my-cluster-masters-0] Authentication finally failed for admin from 127.0.0.1:55578

dobharweim avatar Aug 23 '22 16:08 dobharweim

When I run on IKS, K8s version 1.23.9 I get the master coming up but the data node failing. End of the log:

[2022-08-23T16:45:06,626][INFO ][o.o.n.Node ] [my-cluster-nodes-0] node name [my-cluster-nodes-0], node ID [lI3ryIHnSvKD47w20tTjOg], cluster name [my-cluster], roles [data] [2022-08-23T16:45:33,825][DEPRECATION][o.o.d.c.s.Settings ] [my-cluster-nodes-0] [cluster.initial_master_nodes] setting was deprecated in OpenSearch and will be removed in a future release! See the breaking changes documentation for the next major version. [2022-08-23T16:45:49,045][WARN ][o.o.s.c.Salt ] [my-cluster-nodes-0] If you plan to use field masking pls configure compliance salt e1ukloTsQlOgPquJ to be a random string of 16 chars length identical on all nodes [2022-08-23T16:45:49,523][INFO ][o.o.s.a.i.AuditLogImpl ] [my-cluster-nodes-0] Message routing enabled: true [2022-08-23T16:45:49,965][INFO ][o.o.s.f.SecurityFilter ] [my-cluster-nodes-0] <NONE> indices are made immutable. [2022-08-23T16:45:55,586][INFO ][o.o.a.b.ADCircuitBreakerService] [my-cluster-nodes-0] Registered memory breaker. [2022-08-23T16:46:00,206][INFO ][o.o.m.c.b.MLCircuitBreakerService] [my-cluster-nodes-0] Registered ML memory breaker. [2022-08-23T16:46:07,923][INFO ][o.o.t.NettyAllocator ] [my-cluster-nodes-0] creating NettyAllocator with the following configs: [name=unpooled, suggested_max_allocation_size=256kb, factors={opensearch.unsafe.use_unpooled_allocator=null, g1gc_enabled=true, g1gc_region_size=1mb, heap_size=512mb}] [2022-08-23T16:46:08,805][INFO ][o.o.d.DiscoveryModule ] [my-cluster-nodes-0] using discovery type [zen] and seed hosts providers [settings] [2022-08-23T16:46:14,965][WARN ][o.o.g.DanglingIndicesState] [my-cluster-nodes-0] gateway.auto_import_dangling_indices is disabled, dangling indices will not be automatically detected or imported and must be managed manually

Manifest yaml:

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: my-cluster
  namespace: default
spec:
  # security:
  #   config:
  #     securityConfigSecret:
  #       name: securityconfig-secret
  #     adminCredentialsSecret:
  #       name: admin-credentials-secret
  general:
    additionalConfig:
      cluster.initial_master_nodes: 'my-cluster-masters-0'
      # discovery.seed_hosts: 'my-cluster-masters-0'
      network.host: 0.0.0.0
      network.bind_host: 0.0.0.0
    version: 2.2.0
    httpPort: 9200
    vendor: opensearch
    serviceName: my-cluster
    pluginsList: ["repository-s3"," https://github.com/aiven/prometheus-exporter-plugin-for-opensearch/releases/download/2.2.0.0/prometheus-exporter-2.2.0.0.zip"]
  dashboards:
    # opensearchCredentialsSecret:
    #   name: admin-credentials-secret
    version: 2.2.0
    enable: true
    replicas: 3
    resources:
      requests:
        memory: '1Gi'
        cpu: '500m'
      limits:
        memory: '1Gi'
        cpu: '500m'
  confMgmt:
    smartScaler: true
  nodePools:
    - component: masters
      replicas: 3
      diskSize: '30Gi'
      NodeSelector:
      resources:
        requests:
          memory: '2Gi'
          cpu: '500m'
        limits:
          memory: '2Gi'
          cpu: '500m'
      roles:
        - 'master'
        - 'data'
      # persistence:
      #   emptyDir: {}
    - component: nodes
      replicas: 3
      diskSize: '30Gi'
      NodeSelector:
      resources:
        requests:
          memory: '2Gi'
          cpu: '500m'
        limits:
          memory: '2Gi'
          cpu: '500m'
      roles:
        - 'data'
      # persistence:
      #   emptyDir: {}
    - component: coordinators
      replicas: 3
      diskSize: '30Gi'
      NodeSelector:
      resources:
        requests:
          memory: '2Gi'
          cpu: '500m'
        limits:
          memory: '2Gi'
          cpu: '500m'
      roles:
        - 'ingest'
      # persistence:
      #   emptyDir: {}

The node errors with 137.

image

dobharweim avatar Aug 23 '22 16:08 dobharweim

I got it working!

Current Versions:

  • Helm Chart 2.0.3
  • Opensearch 2.2.0
  • K8s v1.23.7-gke.1400

Steps:

  • Use minimal OpenSearchCluster manifest for Version 1.3.2
  • Once cluster is running change version to 2.2.0
  • Profit

What i noticed is, that when deploying Version 1.3.1 or 1.3.2 there is an additional bootstrapping pod, this pod is not present with higher Opensearch versions.

NAME                                     READY   STATUS    RESTARTS   AGE
opensearch-bootstrap-0                   1/1     Running   0          4m18s
opensearch-dashboards-7b65c8ff45-zz7fv   1/1     Running   0          4m18s
opensearch-masters-0                     1/1     Running   0          4m18s
opensearch-masters-1                     1/1     Running   0          2m37s
opensearch-masters-2                     0/1     Pending   0          57s

So I guess as there is no pod doing the bootstrapping, the cluster creation fails altogether, unless you are upgrading from a previous version, where there is a dedicated bootstrapping pod.

Attachments: Minimal cluster manifest: minimal-cluster.yaml

dickescheid avatar Aug 24 '22 08:08 dickescheid

Steps:

  • Use minimal OpenSearchCluster manifest for Version 1.3.2
  • Once cluster is running change version to 2.2.0

I concluded something similar in #251

edwardsmit avatar Aug 24 '22 09:08 edwardsmit

I concluded something similar in #251

nice, thanks. Good to know we got it working, bummer I didn't find it, would have saved me a lot of work.

Then we'll have to wait for a fix.

dickescheid avatar Aug 24 '22 09:08 dickescheid

Current Versions:

  • Helm Chart 2.0.3
  • Opensearch 1.3.2
  • K8s v1.23.9+IKS

Cluster definition manifest:

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: my-cluster
  namespace: default
spec:
  general:
    version: 1.3.2
    httpPort: 9200
    vendor: opensearch
    serviceName: my-cluster
    # pluginsList: ['repository-s3',' https://github.com/aiven/prometheus-exporter-plugin-for-opensearch/releases/download/1.3.2.0/prometheus-exporter-1.3.2.0.zip']
  dashboards:
    version: 1.3.2
    enable: true
    replicas: 2
    resources:
      requests:
        memory: '1Gi'
        cpu: '500m'
      limits:
        memory: '1Gi'
        cpu: '500m'
  confMgmt:
    smartScaler: true
  nodePools:
    - component: masters
      replicas: 3
      diskSize: '30Gi'
      NodeSelector:
      resources:
        requests:
          memory: '2Gi'
          cpu: '500m'
        limits:
          memory: '2Gi'
          cpu: '500m'
      roles:
        - 'master'
        - 'data'
    - component: nodes
      replicas: 3
      diskSize: '30Gi'
      NodeSelector:
      resources:
        requests:
          memory: '2Gi'
          cpu: '500m'
        limits:
          memory: '2Gi'
          cpu: '500m'
      roles:
        - 'data'
    - component: coordinators
      replicas: 3
      diskSize: '30Gi'
      NodeSelector:
      resources:
        requests:
          memory: '2Gi'
          cpu: '500m'
        limits:
          memory: '2Gi'
          cpu: '500m'
      roles:
        - 'ingest'

Behaviour:

Some pods coming up, some others failing. image

Bootstrap pod

Running but full of TLS errors: [2022-08-24T12:21:34,049][WARN ][o.o.h.AbstractHttpServerTransport] [my-cluster-bootstrap-0] caught exception while handling client http traffic, closing connection Netty4HttpChannel{localAddress=0.0.0.0/0.0.0.0:9200, remoteAddress=null} io.netty.handler.codec.DecoderException: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 474554202f20485454502f312e310d0a486f73743a206c6f63616c686f73743a393230300d0a557365722d4167656e743a20537973646967204167656e742f312e300d0a4163636570742d456e636f64696e673a20677a69702c206465666c6174650d0a4163636570743a20746578742f68746d6c2c202a2f2a0d0a436f6e6e656374696f6e3a206b6565702d616c6976650d0a436f6e74656e742d547970653a206170706c69636174696f6e2f782d7777772d666f726d2d75726c656e636f6465640d0a0d0a at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:480) ~[netty-codec-4.1.73.Final.jar:4.1.73.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279) ~[netty-codec-4.1.73.Final.jar:4.1.73.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.73.Final.jar:4.1.73.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.73.Final.jar:4.1.73.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.73.Final.jar:4.1.73.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) [netty-transport-4.1.73.Final.jar:4.1.73.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722) [netty-transport-4.1.73.Final.jar:4.1.73.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:623) [netty-transport-4.1.73.Final.jar:4.1.73.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:586) [netty-transport-4.1.73.Final.jar:4.1.73.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) [netty-transport-4.1.73.Final.jar:4.1.73.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [netty-common-4.1.73.Final.jar:4.1.73.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.73.Final.jar:4.1.73.Final] at java.lang.Thread.run(Thread.java:829) [?:?] Caused by: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 474554202f20485454502f312e310d0a486f73743a206c6f63616c686f73743a393230300d0a557365722d4167656e743a20537973646967204167656e742f312e300d0a4163636570742d456e636f64696e673a20677a69702c206465666c6174650d0a4163636570743a20746578742f68746d6c2c202a2f2a0d0a436f6e6e656374696f6e3a206b6565702d616c6976650d0a436f6e74656e742d547970653a206170706c69636174696f6e2f782d7777772d666f726d2d75726c656e636f6465640d0a0d0a at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1213) ~[netty-handler-4.1.73.Final.jar:4.1.73.Final] at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1283) ~[netty-handler-4.1.73.Final.jar:4.1.73.Final] at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510) ~[netty-codec-4.1.73.Final.jar:4.1.73.Final] at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:449) ~[netty-codec-4.1.73.Final.jar:4.1.73.Final] ... 16 more [2022-08-24T12:21:34,069][ERROR][o.o.s.s.h.n.SecuritySSLNettyHttpServerTransport] [my-cluster-bootstrap-0] Exception during establishing a SSL connection: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 474554202f5f6e6f6465732f5f6c6f63616c2f73746174733f616c6c3d7472756520485454502f312e310d0a486f73743a206c6f63616c686f73743a393230300d0a557365722d4167656e743a20537973646967204167656e742f312e300d0a4163636570742d456e636f64696e673a20677a69702c206465666c6174650d0a4163636570743a20746578742f68746d6c2c202a2f2a0d0a436f6e6e656374696f6e3a206b6565702d616c6976650d0a436f6e74656e742d547970653a206170706c69636174696f6e2f782d7777772d666f726d2d75726c656e636f6465640d0a0d0a io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 474554202f5f6e6f6465732f5f6c6f63616c2f73746174733f616c6c3d7472756520485454502f312e310d0a486f73743a206c6f63616c686f73743a393230300d0a557365722d4167656e743a20537973646967204167656e742f312e300d0a4163636570742d456e636f64696e673a20677a69702c206465666c6174650d0a4163636570743a20746578742f68746d6c2c202a2f2a0d0a436f6e6e656374696f6e3a206b6565702d616c6976650d0a436f6e74656e742d547970653a206170706c69636174696f6e2f782d7777772d666f726d2d75726c656e636f6465640d0a0d0a at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1213) ~[netty-handler-4.1.73.Final.jar:4.1.73.Final] at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1283) ~[netty-handler-4.1.73.Final.jar:4.1.73.Final] at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510) ~[netty-codec-4.1.73.Final.jar:4.1.73.Final] at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:449) ~[netty-codec-4.1.73.Final.jar:4.1.73.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279) ~[netty-codec-4.1.73.Final.jar:4.1.73.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.73.Final.jar:4.1.73.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.73.Final.jar:4.1.73.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.73.Final.jar:4.1.73.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) [netty-transport-4.1.73.Final.jar:4.1.73.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722) [netty-transport-4.1.73.Final.jar:4.1.73.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:623) [netty-transport-4.1.73.Final.jar:4.1.73.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:586) [netty-transport-4.1.73.Final.jar:4.1.73.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) [netty-transport-4.1.73.Final.jar:4.1.73.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [netty-common-4.1.73.Final.jar:4.1.73.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.73.Final.jar:4.1.73.Final] at java.lang.Thread.run(Thread.java:829) [?:?]

Failing master pod

Terminated, status 0 image

Logs:

[2022-08-24T12:21:51,381][DEPRECATION][o.o.d.c.s.Settings ] [my-cluster-masters-2] [node.max_local_storage_nodes] setting was deprecated in OpenSearch and will be removed in a future release! See the breaking changes documentation for the next major version. [2022-08-24T12:21:51,487][INFO ][o.o.e.NodeEnvironment ] [my-cluster-masters-2] using [1] data paths, mounts [[/usr/share/opensearch/data (fsf-fra0401b-fz.service.softlayer.com:/IBM02SEV292065_1332/data01)]], net usable_space [30gb], net total_space [30gb], types [nfs4] [2022-08-24T12:21:51,488][INFO ][o.o.e.NodeEnvironment ] [my-cluster-masters-2] heap size [512mb], compressed ordinary object pointers [true] [2022-08-24T12:21:52,181][INFO ][o.o.n.Node ] [my-cluster-masters-2] node name [my-cluster-masters-2], node ID [uS2iVk61Rh6aiK6eWpxdDw], cluster name [my-cluster], roles [master, data] [2022-08-24T12:22:45,002][WARN ][o.o.s.c.Salt ] [my-cluster-masters-2] If you plan to use field masking pls configure compliance salt e1ukloTsQlOgPquJ to be a random string of 16 chars length identical on all nodes [2022-08-24T12:22:45,439][INFO ][o.o.s.a.i.AuditLogImpl ] [my-cluster-masters-2] Message routing enabled: true [2022-08-24T12:22:46,382][INFO ][o.o.s.f.SecurityFilter ] [my-cluster-masters-2] <NONE> indices are made immutable. [2022-08-24T12:22:52,401][INFO ][o.o.a.b.ADCircuitBreakerService] [my-cluster-masters-2] Registered memory breaker. [2022-08-24T12:22:56,340][INFO ][o.o.m.c.b.MLCircuitBreakerService] [my-cluster-masters-2] Registered ML memory breaker. [2022-08-24T12:23:03,205][INFO ][o.o.t.NettyAllocator ] [my-cluster-masters-2] creating NettyAllocator with the following configs: [name=unpooled, suggested_max_allocation_size=256kb, factors={opensearch.unsafe.use_unpooled_allocator=null, g1gc_enabled=true, g1gc_region_size=1mb, heap_size=512mb}] [2022-08-24T12:23:04,084][INFO ][o.o.d.DiscoveryModule ] [my-cluster-masters-2] using discovery type [zen] and seed hosts providers [settings] Killing opensearch process 102 [2022-08-24T12:23:07,241][INFO ][o.o.s.a.r.AuditMessageRouter] [my-cluster-masters-2] Closing AuditMessageRouter Killing performance analyzer process 103 OpenSearch exited with code 143 Performance analyzer exited with code 143

dobharweim avatar Aug 24 '22 12:08 dobharweim

did you try with the minimal cluster deployment from the Quickstart and upgrading from there?

The Helm chart for the opensearch-operator can also be the latest 2.0.4.

dickescheid avatar Aug 24 '22 12:08 dickescheid

Yes unfortunately a similar situation with the minimal deployment.

On Wed 24 Aug 2022, 13:35 Michi, @.***> wrote:

did you try with the minimal cluster deployment from the Quickstart https://github.com/Opster/opensearch-k8s-operator/blob/main/docs/userguide/main.md#quickstart and upgrading from there?

The Helm chart for the opensearch-operator can also be the latest 2.0.4.

— Reply to this email directly, view it on GitHub https://github.com/Opster/opensearch-k8s-operator/issues/267#issuecomment-1225663724, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALA742SGLQ4BBS3QKKUWDYTV2YJJXANCNFSM57HYNAEA . You are receiving this because you authored the thread.Message ID: @.***>

dobharweim avatar Aug 26 '22 01:08 dobharweim

We are facing the same here. Latest versions won't run bootstrap pod under some conditions. We have looked over controller pod, it stops receiving events (or they aren't being triggered). This just happens when using version 2.x and ahead of OpenSearch. If, under same config, we change to 1.3.2 and try again, bootstrap pod will launch and everything works.

danielbichuetti avatar Aug 28 '22 15:08 danielbichuetti

Hey @danielbichuetti can you please send us the Yaml file that you deployed on v2.x? i want to test it in our env. Thanks !

idanl21 avatar Sep 01 '22 09:09 idanl21

Hi @idanl21 ! Sure I can. Here it's the CR:

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  annotations:
    pulumi.com/patchForce: 'true'
  creationTimestamp: '2022-08-29T12:28:39Z'
  finalizers:
    - Opster
  generation: 2
  managedFields:
    - apiVersion: opensearch.opster.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            f:pulumi.com/patchForce: {}
        f:spec:
          f:confMgmt:
            f:smartScaler: {}
          f:dashboards:
            f:additionalConfig:
              f:opensearch.password: {}
              f:opensearch.username: {}
            f:enable: {}
            f:opensearchCredentialsSecret:
              f:name: {}
            f:replicas: {}
            f:resources:
              f:limits:
                f:cpu: {}
                f:memory: {}
              f:requests:
                f:cpu: {}
                f:memory: {}
            f:tls:
              f:enable: {}
              f:generate: {}
            f:version: {}
          f:general:
            f:serviceName: {}
            f:setVMMaxMapCount: {}
            f:version: {}
          f:nodePools: {}
          f:security:
            f:config:
              f:adminCredentialsSecret:
                f:name: {}
              f:securityConfigSecret:
                f:name: {}
            f:tls:
              f:http:
                f:generate: {}
              f:transport:
                f:generate: {}
      manager: pulumi-kubernetes-7bee476a
      operation: Apply
      time: '2022-08-29T12:28:39Z'
    - apiVersion: opensearch.opster.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:finalizers:
            .: {}
            v:"Opster": {}
        f:spec:
          f:bootstrap:
            .: {}
            f:resources: {}
          f:dashboards:
            f:tls:
              f:caSecret: {}
              f:secret: {}
          f:security:
            f:config:
              f:adminSecret: {}
            f:tls:
              f:http:
                f:caSecret: {}
                f:secret: {}
              f:transport:
                f:caSecret: {}
                f:secret: {}
      manager: manager
      operation: Update
      time: '2022-08-29T12:28:39Z'
    - apiVersion: opensearch.opster.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:componentsStatus: {}
          f:initialized: {}
          f:phase: {}
          f:version: {}
      manager: manager
      operation: Update
      subresource: status
      time: '2022-08-29T12:28:59Z'
  name: opensearch-cluster
  namespace: opensearch-b329608d
  resourceVersion: '4377'
  uid: 84235311-08a2-4a39-a2f5-44e2597c6bd6
  selfLink: >-
    /apis/opensearch.opster.io/v1/namespaces/opensearch-b329608d/opensearchclusters/opensearch-cluster
status:
  componentsStatus:
    - {}
  initialized: true
  phase: RUNNING
  version: 2.2.0
spec:
  bootstrap:
    resources: {}
  confMgmt:
    smartScaler: true
  dashboards:
    additionalConfig:
      opensearch.password: D*ypBxf*2$*v0R6A4KPpUWsb
      opensearch.username: admin
    enable: true
    opensearchCredentialsSecret:
      name: opensearch-cluster-dashboarduser-secret
    replicas: 1
    resources:
      limits:
        cpu: 500m
        memory: 1Gi
      requests:
        cpu: 300m
        memory: 512Mi
    tls:
      caSecret: {}
      enable: true
      generate: true
      secret: {}
    version: 2.2.0
  general:
    httpPort: 9200
    serviceName: opensearch-cluster
    setVMMaxMapCount: true
    version: 2.2.0
  nodePools:
    - component: masters
      diskSize: 50Gi
      jvm: '-Xmx2048M -Xms2048M'
      persistence:
        pvc:
          accessModes:
            - ReadWriteOnce
          storageClass: premium-lrs-retain
      replicas: 3
      resources:
        limits:
          cpu: 500m
          memory: 3Gi
        requests:
          cpu: 500m
          memory: 1Gi
      roles:
        - master
        - data
  security:
    config:
      adminCredentialsSecret:
        name: opensearch-cluster-admin-secret
      adminSecret: {}
      securityConfigSecret:
        name: opensearch-cluster-securityconfig-secret
    tls:
      http:
        caSecret: {}
        generate: true
        secret: {}
      transport:
        caSecret: {}
        generate: true
        secret: {}

danielbichuetti avatar Sep 01 '22 17:09 danielbichuetti

Hey All, just FYI i'm able to get the cluster up and running with the following yaml

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: my-first-cluster
  namespace: default
spec:
  security:
    config: 
    tls:
       http:
         generate: true 
       transport:
         generate: true
         perNode: true
  general:
    httpPort: 9400
    serviceName: my-first-cluster
    version: 2.2.1
    pluginsList: ["repository-s3"]
    drainDataNodes: true
  dashboards:
    version: 2.2.1
    enable: true
    replicas: 1
    resources:
      requests:
         memory: "512Mi"
         cpu: "200m"
      limits:
         memory: "512Mi"
         cpu: "200m"
  nodePools:
    - component: masters
      replicas: 3
      resources:
         requests:
            memory: "1Gi"
            cpu: "550m"
         limits:
            memory: "1Gi"
            cpu: "550m"
      roles:
        - "data"
       # - "master", since version > 2.0.0 use cluster_manager
        - "cluster_manager"
      persistence:
         emptyDir: {}

I believe there is some confusion with roles passing - "cluster_manager" for above 2.0.0 should work fine.

prudhvigodithi avatar Sep 03 '22 03:09 prudhvigodithi

Oh, that's right! I'll check tomorrow this. Indeed as on example it was master, and it supported both 1.x and 2.x, I assumed Operator used the CR to generate a perfect fit for each version.

I looked over source for other thing and missed this out.

danielbichuetti avatar Sep 03 '22 03:09 danielbichuetti

Hey All, just FYI i'm to get the cluster up and running with the following yaml

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: my-first-cluster
  namespace: default
spec:
  security:
    config: 
    tls:
       http:
         generate: true 
       transport:
         generate: true
         perNode: true
  general:
    httpPort: 9400
    serviceName: my-first-cluster
    version: 2.2.1
    pluginsList: ["repository-s3"]
    drainDataNodes: true
  dashboards:
    version: 2.2.1
    enable: true
    replicas: 1
    resources:
      requests:
         memory: "512Mi"
         cpu: "200m"
      limits:
         memory: "512Mi"
         cpu: "200m"
  nodePools:
    - component: masters
      replicas: 3
      resources:
         requests:
            memory: "1Gi"
            cpu: "550m"
         limits:
            memory: "1Gi"
            cpu: "550m"
      roles:
        - "data"
       # - "master", since version > 2.0.0 use cluster_manager
        - "cluster_manager"
      persistence:
         emptyDir: {}

I believe there is some confusion with roles passing - "cluster_manager" for above 2.0.0 should work fine.

@prudhvigodithi I can confirm (gladly) that the above works. Thanks.

P.S. - I did initially see the second master pod getting killed with a 137 error again, so I increased resources:

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: my-first-cluster
  namespace: default
spec:
  security:
    config:
    tls:
       http:
         generate: true 
       transport:
         generate: true
         perNode: true
  general:
    httpPort: 9400
    serviceName: my-first-cluster
    version: 2.2.1
    pluginsList: ["repository-s3"]
    drainDataNodes: true
  dashboards:
    tls:
      enable: true
      generate: true
    version: 2.2.1
    enable: true
    replicas: 1
    resources:
      requests:
         memory: "512Mi"
         cpu: "200m"
      limits:
         memory: "512Mi"
         cpu: "200m"
  nodePools:
    - component: masters
      replicas: 3
      resources:
         requests:
            memory: "4Gi"
            cpu: "1000m"
         limits:
            memory: "4Gi"
            cpu: "1000m"
      roles:
        - "data"
        - "cluster_manager"
      persistence:
         emptyDir: {}

dobharweim avatar Sep 05 '22 15:09 dobharweim

Hey Thanks for the update @dobharweim and for all who participated here :), closing this issue, please feel free to reopen if required. We should have some details about the change cluster_manager reflecting in README docs. @idanl21 @segalziv @swoehrl-mw Thank you

prudhvigodithi avatar Sep 05 '22 16:09 prudhvigodithi

@prudhvigodithi I don't believe this should be closed until the changes have gone into the docs. The issue is that the documented cluster does not work - that has not changed. I can get a PR in at some stage hopefully this week, but the issue should remain open until it is resolved.

dobharweim avatar Sep 07 '22 10:09 dobharweim

@dickescheid thanks for all the help time and input also early on!

dobharweim avatar Sep 07 '22 10:09 dobharweim

glad to help @dobharweim Agree, documentation is all, as long as that is not fixed the issues will not cease, even if it is not an issue.

dickescheid avatar Sep 07 '22 10:09 dickescheid