cross-cluster-replication icon indicating copy to clipboard operation
cross-cluster-replication copied to clipboard

Cross Cluster Replication across 2 k8s cluster is not working

Open skumarp7 opened this issue 2 years ago • 10 comments

Hi,

We are trying to establish a cross-cluster replication setup between two Kubernetes clusters. We are unable to establish the connection between the leader and the follower using nodePort using the below APIs documented from the OpenSearch documentation. In our case, we have not enabled security plugin.

We have tried creating an API call to _cluster/settings as:

curl -XPUT "http://${FOLLOWER}/_cluster/settings?pretty" \ -H 'Content-Type: application/json' -d "                                                                                                                                             
{   
\"persistent\": {                                                                                                                      
   \"cluster\": {
     \"remote\": {
       \"leader-cluster\": {
          \"seeds\": [ \"${LEADER_TRANSPORT}\" ]
       }
     }
   }
 }
}
"

and we have also tried using _autofollow API call for the replication of required indices for the follower to follow.

curl -XPOST “http://${FOLLOWER}/_plugins/_replication/_autofollow?pretty” -H ‘Content-type: application/json’ -d’{ “leader_alias” : “leader-cluster”, “name”: “test-rule”, “pattern”: “test*” }’

output:

{
 "error" : {
 "root_cause" : [
   {
     "type" : "illegal_state_exception",
     "reason" : "Unable to open any connections to remote cluster [leader- cluster]"
   }
 ],
 "type" : "illegal_state_exception",
 "reason" : "Unable to open any connections to remote cluster [leader-cluster]"
},
"status" : 500
}

Query: Does Cross-Cluster Replication plugin work across 2 different Kubernetes clusters? Is there any other configuration required for the connection to be established?

Note: Cross-Cluster Replication works on the same Kubernetes cluster where the leader and the follower OpenSearch clusters are across different namespace.

skumarp7 avatar Nov 22 '22 07:11 skumarp7

@skumarp7 Could you provide more details on the LEADER_TRANSPORT used for the remote cluster connection?

saikaranam-amazon avatar Nov 25 '22 05:11 saikaranam-amazon

@saikaranam-amazon, LEADER_TRANSPORT is the : ( nodeport of the opensearch service exposed on 9300 port)

skumarp7 avatar Nov 30 '22 05:11 skumarp7

If the transport is in the form of node_ip:9300, Could you please confirm if the follower cluster nodes has network access to the node_ip on 9300.

saikaranam-amazon avatar Nov 30 '22 05:11 saikaranam-amazon

May be linked with https://github.com/opensearch-project/cross-cluster-replication/issues/371

mrMigles avatar May 19 '23 05:05 mrMigles

You must have port 9300 available for both sites for communication, Try deploying both prod and dr sites on the same cluster with different namespaces, and use the service name within your replication connection configuration and it will work, as 9300 is working on binary protocol as per docs.

zalseryani avatar May 21 '23 10:05 zalseryani

Hi,

The replication is happening if the follower and leader cluster is a part of the same kubernetes cluster (in different namespace). Here we are trying to replicate indices of opensearch cluster which is in a different kubernetes cluster and the follower is in a different k8s cluster.

skumarp7 avatar May 22 '23 04:05 skumarp7

@skumarp7

Then you can do the following:

  • Export both OpenSearch services on a load balancer --- LoadBalancer Kubernetes Service.
  • Create a DNS record for both public IP addresses.
  • Create a Self-Signed certificate for both domains if you don't want to have SSL Certificate from a well-known CA.
  • Use the Production Domain in your DR site for configuring the replication.

This is the best I can think of for your case.

I hope that I have helped.

zalseryani avatar May 22 '23 10:05 zalseryani

@skumarp7

Solution 1: Opensearch with Kubernetes LoadBalancer Service

I tried the replication between 2 k8s clusters and it worked with a LoadBalancer service for opensearch.

apiVersion: v1
kind: Service
metadata:
  annotations:
    meta.helm.sh/release-name: opensearch
    meta.helm.sh/release-namespace: opensearch-prod
  labels:
    app.kubernetes.io/component: opensearch-cluster-master
    app.kubernetes.io/instance: opensearch
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: opensearch
    app.kubernetes.io/version: 2.7.0
    helm.sh/chart: opensearch-2.12.0
  name: opensearch-cluster-master-lb
  namespace: opensearch-prod
spec:
  ports:
  - name: transport
    port: 9300
    protocol: TCP
    targetPort: 9300
  selector:
    app.kubernetes.io/instance: opensearch
    app.kubernetes.io/name: opensearch
  sessionAffinity: None
  type: LoadBalancer
  • following the creation of this service, you can use the following replication api request to create a remote cross-cluster connection.
PUT _cluster/settings
{
  "persistent": {
    "cluster": {
      "remote": {
        "my-connection-lb-test": {
          "seeds": ["<Load Balancer Service Public IP>:9300"]
        }
      }
    }
    
    
  }
}

Solution 2: Nginx-Ingress Controller Configuration with Configmap.

you can add port 9300 since it is a TCP port to the Nginx controller configmap, and tell it that any request to port 9300 coming to the public IP address of the Nginx controller service will be redirected to the OpenSearch service on port 9300.

Note: I did not create a self-signed certificate in both solutions and used the default self-signed certificate generated by Opensearch because I am disabling the hostname verification in the opensearch.yml file, check the below screenshot

image

Hope that I helped.

Thank you, if you tested it and worked for you, kindly just confirm it here in order for others who are facing the same case to have the solution available for them.

zalseryani avatar May 23 '23 11:05 zalseryani

This does not work Even after exposing the TCP port via Ingress and then creating the remote connection request

{
  "acknowledged" : true,
  "persistent" : {
    "cluster" : {
      "remote" : {
        "remote-opensearch-cluster" : {
          "seeds" : [
            "10.56.34.99:9300"
          ]
        }
      }
    }
  },
  "transient" : { }
}

And then starting the replication request , its still timing out and displays an internal IP in the error messag

HTTP/1.1 500 Internal Server Error
content-type: application/json; charset=UTF-8
content-length: 354

{
    "error": {                                                                                                                                                                                                                                               
        "root_cause": [
            {
                "type": "connect_transport_exception",
                "reason": "[opensearch-cluster-master-1][10.42.149.35:9300] connect_timeout[30s]"
            }
        ],
        "type": "connect_transport_exception",
        "reason": "[opensearch-cluster-master-1][10.42.149.35:9300] connect_timeout[30s]"
    },
    "status": 500
}

I checked the remote port is open and accessable

❯ telnet 10.56.34.99 9300
Trying 10.56.34.99...
Connected to 10.56.34.99.
Escape character is '^]'.
Connection closed by foreign host.

kha7iq avatar Sep 27 '23 08:09 kha7iq

This is how i made cross-cluster replication work between two OpenSearch clusters deployed in Kubernetes across different regions.

  1. Expose TCP Ports on Ingress (Traefik):

You can achieve this by adding an entry in your Traefik configuration, either by modifying the values.yaml file using Helm or manually editing the config.

Here's an example of what the entry might look like in your Helm chart's values.yaml.

   os-cluster-port:
     expose: true
     exposedPort: 9300
     port: 9300
     protocol: TCP
   os-api-port:
     expose: true
     exposedPort: 9200
     port: 9200
     protocol: TCP
  1. Apply IngressRouteTCP to Both Clusters:

Create IngressRouteTCP resources on both clusters to route traffic to the appropriate OpenSearch cluster.

# For API Port
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRouteTCP
metadata:
  labels:
    app.kubernetes.io/component: elastic
  name: opensearch-api-port
spec:
  entryPoints:
  - os-api-port
  routes:
  - match: HostSNI(`*`)
    services:
    - name: opensearch-cluster-master-headless # <service name>
      port: 9200
# For Cluster Port
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRouteTCP
metadata:
  labels:
    app.kubernetes.io/component: opensearch
  name: opensearch-cluster-port
spec:
  entryPoints:
  - os-cluster-port
  routes:
  - match: HostSNI(`*`)
    services:
    - name: opensearch-cluster-master-headless # <service name>
      port: 9300

  1. Check Connectivity for Both Clusters:

curl -u username:password -X GET -k 'https://leader-cluster.example.com:9200'
curl -u username:password -X GET -k 'https://follower-cluster.example.com:9200'

  1. Create a Connection on the Follower Cluster:
# Proxy Mode
curl -XPUT -k -H 'Content-Type: application/json' -u 'username:password' 'https://follower-cluster.example.com:9200/_cluster/settings?pretty' -d '
{
  "persistent": {
    "cluster": {
      "remote": {
        "opensearch-cluster-eu": {
          "mode": "proxy", 
          "proxy_address": "10.56.34.99:9300"
        }
      }
    }
  }
}'

  1. Create an Index on the Leader Cluster:
curl -XPUT -k -H 'Content-Type: application/json' -u 'username:password' 'https://leader-cluster.example.com:9200/leader-01?pretty'

  1. Start Replication:
curl -XPUT -k -H 'Content-Type: application/json' -u 'username:password' 'https://follower-cluster.example.com:9200/_plugins/_replication/follower-01/_start?pretty' -d '
{
   "leader_alias": "opensearch-cluster-eu",
   "leader_index": "leader-01",
   "use_roles": {
      "leader_cluster_role": "all_access",
      "follower_cluster_role": "all_access"
   }
}'

  1. Check Replication Status:
curl -XGET -k -u 'username:password' 'https://follower-cluster.example.com:9200/_plugins/_replication/follower-01/_status?pretty'
# Output
{
  "status": "SYNCING",
  "reason": "User initiated",
  "leader_alias": "opensearch-cluster-eu",
  "leader_index": "leader-01",
  "follower_index": "follower-01",
  "syncing_details": {
    "leader_checkpoint": -1,
    "follower_checkpoint": -1,
    "seq_no": 0
  }
}

kha7iq avatar Oct 03 '23 13:10 kha7iq