cross-cluster-replication
cross-cluster-replication copied to clipboard
Cross Cluster Replication across 2 k8s cluster is not working
Hi,
We are trying to establish a cross-cluster replication setup between two Kubernetes clusters. We are unable to establish the connection between the leader and the follower using nodePort using the below APIs documented from the OpenSearch documentation. In our case, we have not enabled security plugin.
We have tried creating an API call to _cluster/settings as:
curl -XPUT "http://${FOLLOWER}/_cluster/settings?pretty" \ -H 'Content-Type: application/json' -d "
{
\"persistent\": {
\"cluster\": {
\"remote\": {
\"leader-cluster\": {
\"seeds\": [ \"${LEADER_TRANSPORT}\" ]
}
}
}
}
}
"
and we have also tried using _autofollow API call for the replication of required indices for the follower to follow.
curl -XPOST “http://${FOLLOWER}/_plugins/_replication/_autofollow?pretty” -H ‘Content-type: application/json’ -d’{ “leader_alias” : “leader-cluster”, “name”: “test-rule”, “pattern”: “test*” }’
output:
{
"error" : {
"root_cause" : [
{
"type" : "illegal_state_exception",
"reason" : "Unable to open any connections to remote cluster [leader- cluster]"
}
],
"type" : "illegal_state_exception",
"reason" : "Unable to open any connections to remote cluster [leader-cluster]"
},
"status" : 500
}
Query: Does Cross-Cluster Replication plugin work across 2 different Kubernetes clusters? Is there any other configuration required for the connection to be established?
Note: Cross-Cluster Replication works on the same Kubernetes cluster where the leader and the follower OpenSearch clusters are across different namespace.
@skumarp7 Could you provide more details on the LEADER_TRANSPORT
used for the remote cluster connection?
@saikaranam-amazon, LEADER_TRANSPORT is the
If the transport is in the form of node_ip:9300
, Could you please confirm if the follower cluster nodes has network access to the node_ip on 9300.
May be linked with https://github.com/opensearch-project/cross-cluster-replication/issues/371
You must have port 9300 available for both sites for communication, Try deploying both prod and dr sites on the same cluster with different namespaces, and use the service name within your replication connection configuration and it will work, as 9300 is working on binary protocol as per docs.
Hi,
The replication is happening if the follower and leader cluster is a part of the same kubernetes cluster (in different namespace). Here we are trying to replicate indices of opensearch cluster which is in a different kubernetes cluster and the follower is in a different k8s cluster.
@skumarp7
Then you can do the following:
- Export both OpenSearch services on a load balancer --- LoadBalancer Kubernetes Service.
- Create a DNS record for both public IP addresses.
- Create a Self-Signed certificate for both domains if you don't want to have SSL Certificate from a well-known CA.
- Use the Production Domain in your DR site for configuring the replication.
This is the best I can think of for your case.
I hope that I have helped.
@skumarp7
Solution 1: Opensearch with Kubernetes LoadBalancer Service
I tried the replication between 2 k8s clusters and it worked with a LoadBalancer service for opensearch.
apiVersion: v1
kind: Service
metadata:
annotations:
meta.helm.sh/release-name: opensearch
meta.helm.sh/release-namespace: opensearch-prod
labels:
app.kubernetes.io/component: opensearch-cluster-master
app.kubernetes.io/instance: opensearch
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: opensearch
app.kubernetes.io/version: 2.7.0
helm.sh/chart: opensearch-2.12.0
name: opensearch-cluster-master-lb
namespace: opensearch-prod
spec:
ports:
- name: transport
port: 9300
protocol: TCP
targetPort: 9300
selector:
app.kubernetes.io/instance: opensearch
app.kubernetes.io/name: opensearch
sessionAffinity: None
type: LoadBalancer
- following the creation of this service, you can use the following replication api request to create a remote cross-cluster connection.
PUT _cluster/settings
{
"persistent": {
"cluster": {
"remote": {
"my-connection-lb-test": {
"seeds": ["<Load Balancer Service Public IP>:9300"]
}
}
}
}
}
Solution 2: Nginx-Ingress Controller Configuration with Configmap.
you can add port 9300 since it is a TCP port to the Nginx controller configmap, and tell it that any request to port 9300 coming to the public IP address of the Nginx controller service will be redirected to the OpenSearch service on port 9300.
Note: I did not create a self-signed certificate in both solutions and used the default self-signed certificate generated by Opensearch because I am disabling the hostname verification in the opensearch.yml file, check the below screenshot
Hope that I helped.
Thank you, if you tested it and worked for you, kindly just confirm it here in order for others who are facing the same case to have the solution available for them.
This does not work Even after exposing the TCP port via Ingress and then creating the remote connection request
{
"acknowledged" : true,
"persistent" : {
"cluster" : {
"remote" : {
"remote-opensearch-cluster" : {
"seeds" : [
"10.56.34.99:9300"
]
}
}
}
},
"transient" : { }
}
And then starting the replication request , its still timing out and displays an internal IP in the error messag
HTTP/1.1 500 Internal Server Error
content-type: application/json; charset=UTF-8
content-length: 354
{
"error": {
"root_cause": [
{
"type": "connect_transport_exception",
"reason": "[opensearch-cluster-master-1][10.42.149.35:9300] connect_timeout[30s]"
}
],
"type": "connect_transport_exception",
"reason": "[opensearch-cluster-master-1][10.42.149.35:9300] connect_timeout[30s]"
},
"status": 500
}
I checked the remote port is open and accessable
❯ telnet 10.56.34.99 9300
Trying 10.56.34.99...
Connected to 10.56.34.99.
Escape character is '^]'.
Connection closed by foreign host.
This is how i made cross-cluster replication work between two OpenSearch clusters deployed in Kubernetes across different regions.
- Expose TCP Ports on Ingress (Traefik):
You can achieve this by adding an entry in your Traefik configuration, either by modifying the values.yaml file using Helm or manually editing the config.
Here's an example of what the entry might look like in your Helm chart's values.yaml.
os-cluster-port:
expose: true
exposedPort: 9300
port: 9300
protocol: TCP
os-api-port:
expose: true
exposedPort: 9200
port: 9200
protocol: TCP
- Apply IngressRouteTCP to Both Clusters:
Create IngressRouteTCP resources on both clusters to route traffic to the appropriate OpenSearch cluster.
# For API Port
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRouteTCP
metadata:
labels:
app.kubernetes.io/component: elastic
name: opensearch-api-port
spec:
entryPoints:
- os-api-port
routes:
- match: HostSNI(`*`)
services:
- name: opensearch-cluster-master-headless # <service name>
port: 9200
# For Cluster Port
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRouteTCP
metadata:
labels:
app.kubernetes.io/component: opensearch
name: opensearch-cluster-port
spec:
entryPoints:
- os-cluster-port
routes:
- match: HostSNI(`*`)
services:
- name: opensearch-cluster-master-headless # <service name>
port: 9300
- Check Connectivity for Both Clusters:
curl -u username:password -X GET -k 'https://leader-cluster.example.com:9200'
curl -u username:password -X GET -k 'https://follower-cluster.example.com:9200'
- Create a Connection on the Follower Cluster:
# Proxy Mode
curl -XPUT -k -H 'Content-Type: application/json' -u 'username:password' 'https://follower-cluster.example.com:9200/_cluster/settings?pretty' -d '
{
"persistent": {
"cluster": {
"remote": {
"opensearch-cluster-eu": {
"mode": "proxy",
"proxy_address": "10.56.34.99:9300"
}
}
}
}
}'
- Create an Index on the Leader Cluster:
curl -XPUT -k -H 'Content-Type: application/json' -u 'username:password' 'https://leader-cluster.example.com:9200/leader-01?pretty'
- Start Replication:
curl -XPUT -k -H 'Content-Type: application/json' -u 'username:password' 'https://follower-cluster.example.com:9200/_plugins/_replication/follower-01/_start?pretty' -d '
{
"leader_alias": "opensearch-cluster-eu",
"leader_index": "leader-01",
"use_roles": {
"leader_cluster_role": "all_access",
"follower_cluster_role": "all_access"
}
}'
- Check Replication Status:
curl -XGET -k -u 'username:password' 'https://follower-cluster.example.com:9200/_plugins/_replication/follower-01/_status?pretty'
# Output
{
"status": "SYNCING",
"reason": "User initiated",
"leader_alias": "opensearch-cluster-eu",
"leader_index": "leader-01",
"follower_index": "follower-01",
"syncing_details": {
"leader_checkpoint": -1,
"follower_checkpoint": -1,
"seq_no": 0
}
}