[BUG][Docker Swarm Multi Node Cluster connectivity issue]
Description:
I am facing an issue that my OpenSearch cluster won't connect, I am running 3 virtual machines and created their docker swarm cluster, I want to create one container on each virtual machine and create cluster.
To Reproduce:
Steps to reproduce the behavior:
- All you need is to copy the docker-compose file and create docker swarm 3 vm cluster and make sure to add labels.
- Copy the docker compose and create docker-compose.yml file
- use command to run, "docker stack deploy -c 'file-name' 'cluster-name' e.g docker stack deploy -c docker-compose.yml opensearch
docker-compose.yml
version: '3' services: opensearch-node1: image: opensearchproject/opensearch:2.18.0 #container_name: opensearch-node1 environment: - cluster.name=opensearch-cluster - node.name=opensearch-node1 - discovery.seed_hosts=opensearch-node1,opensearch-node2,opensearch-node3 - cluster.initial_cluster_manager_nodes=opensearch-node1,opensearch-node2,opensearch-node3 - bootstrap.memory_lock=true - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" - "DISABLE_INSTALL_DEMO_CONFIG=true" - "DISABLE_SECURITY_PLUGIN=true" - OPENSEARCH_INITIAL_ADMIN_PASSWORD=Hamza@31017 ulimits: memlock: soft: -1 hard: -1 nofile: soft: 65536 hard: 65536 volumes: - /opt/os/data1:/usr/share/opensearch/data ports: - 9200:9200 - 9600:9600 - 9300:9300 deploy: placement: constraints: - "node.labels.db == ubuntu" networks: - opensearch-net
opensearch-node2: image: opensearchproject/opensearch:2.18.0 #container_name: opensearch-node2 environment: - cluster.name=opensearch-cluster - node.name=opensearch-node2 - discovery.seed_hosts=opensearch-node1,opensearch-node2,opensearch-node3 - cluster.initial_cluster_manager_nodes=opensearch-node1,opensearch-node2,opensearch-node3 - bootstrap.memory_lock=true - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" - "DISABLE_INSTALL_DEMO_CONFIG=true" - "DISABLE_SECURITY_PLUGIN=true" - OPENSEARCH_INITIAL_ADMIN_PASSWORD=Hamza@31017 ulimits: memlock: soft: -1 hard: -1 nofile: soft: 65536 hard: 65536 volumes: - /opt/os/data2:/usr/share/opensearch/data deploy: placement: constraints: - "node.labels.db == node1" networks: - opensearch-net
opensearch-node3: image: opensearchproject/opensearch:2.18.0 #container_name: opensearch-node3 environment: - cluster.name=opensearch-cluster - node.name=opensearch-node2 - discovery.seed_hosts=opensearch-node1,opensearch-node2,opensearch-node3 - cluster.initial_cluster_manager_nodes=opensearch-node1,opensearch-node2,opensearch-node3 - bootstrap.memory_lock=true - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" - "DISABLE_INSTALL_DEMO_CONFIG=true" - "DISABLE_SECURITY_PLUGIN=true" - OPENSEARCH_INITIAL_ADMIN_PASSWORD=Hamza@31017 ulimits: memlock: soft: -1 hard: -1 nofile: soft: 65536 hard: 65536 volumes: - /opt/os/data3:/usr/share/opensearch/data deploy: placement: constraints: - "node.labels.db == node2" networks: - opensearch-net
opensearch-dashboards: image: opensearchproject/opensearch-dashboards:2.18.0 container_name: opensearch-dashboards ports: - 5601:5601 expose: - "5601" environment: - 'OPENSEARCH_HOSTS=["http://opensearch-node1:9200","http://opensearch-node2:9200","http://opensearch-node3:9200"]' - "DISABLE_SECURITY_DASHBOARDS_PLUGIN=true" networks: - opensearch-net
networks: opensearch-net:
ISSUE:
[WARN ][o.o.c.c.ClusterFormationFailureHelper] [opensearch-node1] cluster-manager not discovered or elected yet, an election requires at least 2 nodes with ids from [hTfIxK_mST-qJgwiGS0w6w, N8R2kscGT1GAbK-L-mj_qQ, Yesi_vQZQyC-iCwDRkrjuw], have discovered [{opensearch-node1}{hTfIxK_mST-qJgwiGS0w6w}{SCJ76MhQQT6Q2ql-KMPpBg}{10.0.0.80}{10.0.0.80:9300}{dimr}{shard_indexing_pressure_enabled=true}, {opensearch-node2}{4J085Ma_T3qvuPUshCdLBA}{TodgiY4lQ6KY5RF8pZblNQ}{10.0.1.17}{10.0.1.17:9300}{dimr}{shard_indexing_pressure_enabled=true}, {opensearch-node2}{MrJ4dBqdQO2QlHp0RDulPA}{qCoQee04QuaSsO-J8Wc8mQ}{10.0.1.18}{10.0.1.18:9300}{dimr}{shard_indexing_pressure_enabled=true}] which is not a quorum; discovery will continue using [10.0.1.5:9300, 10.0.1.7:9300, 10.0.1.10:9300] from hosts providers and [{opensearch-node1}{hTfIxK_mST-qJgwiGS0w6w}{SCJ76MhQQT6Q2ql-KMPpBg}{10.0.0.80}{10.0.0.80:9300}{dimr}{shard_indexing_pressure_enabled=true}] from last-known cluster state; node term 3, last-accepted version 89 in term 3
Expected behavior:
It should run the cluster as it runs with docker-compose on single vm, if i run this on single vm by removing the placement constraints from the docker-compose.yml it runs fine on only one vm. I need to work with docker swarm by having one container on each node and create cluster.
Host/Environment:
- OS: Linux - Ubuntu
- Version 20.04
@hamzaismaeel15 That looks more like you only have a single node running in the swarm? Are you sure all the containers are running correctly?
@DandyDeveloper All containers are deployed on different Virtual machines as you can see the placement constraints defined in the file, which deploys each container on dedicated hostname virtual machine, if it is possible you can join me via call to see the setup. Thank you
@hamzaismaeel15 Sorry, I'm unable to join a call, but if possible, can you go into the container and verify that your CRI / CNI has appropriately configure the hostnames and the other nodes are resolvable in the network?
The implication is certainly that the swarm side of things is either misconfigured, or the hosts are, but we'll need a lot more info to dive into it.
Exec into a container, try polling the other nodes to establish whether the networking is setup correctly or not.
@DandyDeveloper sorry i have just got a chance to see your comment, I have deployed it on multiple vm's and everything works fine for me, the network and everything works, I have solved all the issues and its work via swarm. Thanks