[BUG] _remote/info API is giving "connected: false" when replication is undergoing with proxy mode.
What is the bug?
_remote/info API is returning "connected" field as false even when the replication is happening seemlessly through proxy mode.
bash-4.4$ curl -XGET "https://opensearch-1:9200/_remote/info?pretty"
{
"leader-site" : {
"connected" : false,
"mode" : "proxy",
"proxy_address" : "localhost:9302",
"server_name" : "",
"num_proxy_sockets_connected" : 0,
"max_proxy_socket_connections" : 18,
"initial_connect_timeout" : "30s",
"skip_unavailable" : false
}
}
If there are more than 1 ingest nodes in the follower, the behaviour is different. When the API was hit to ingest node 1 - the result was false through out and when we hit the API to ingest node 2 - the result was true thoughout.
bash-4.4$ curl -XGET "https://opensearch-2:9200/_remote/info?pretty"
{
"leader-site" : {
"connected" : true,
"mode" : "proxy",
"proxy_address" : "localhost:9302",
"server_name" : "",
"num_proxy_sockets_connected" : 0,
"max_proxy_socket_connections" : 18,
"initial_connect_timeout" : "30s",
"skip_unavailable" : false
}
}
How can one reproduce the bug? Steps to reproduce the behavior:
- Install leader and follower opensearch (each having one ingest, one data and one master node) and create a autofollow replication rule in follower.
- Ensure that the indices are getting replicated to the follower cluster.
- Check the connection status using _remote/info API
- "connected" field is returning false through out even though the replication is happening.
If there are more than 1 ingest nodes in the follower, the behaviour is different. When the API was hit to ingest node 1 - the result was false through out and when we hit the API to ingest node 2 - the result was true thoughout.
What is the expected behavior? The "connected" field in _remote/info should be consistently "true" when the connection is successful across passive and active site and when the replication is performing seemlessly. Is this API valid in these scenarios?
Hi @ankitkala,
Any view on this ?
Looks like there might be a bug. Can you take a stab at why this might be happening?
Hi @ankitkala ,
I can look into it. But i would like to know if this actually might impact the system at a high level. Can you check and let us know if you have any info on this as in our deployment, this is observed a lot.
Hi @dblock , @ankitkala
Any update on the above ticket?
This api is very unreliable. Please help the issue get resolved asap. Currently I don't see there is any way to validate that follower is connecting to leader when all replicated indices are in PAUSED state
Hi @skumarp7, There is no one dedicatedly supporting the repository (unless its a critical release blocker). Feel free to debug the issue. If there are any ideas/hypothesis you've that you want to discuss, one of the maintainers would definitely be able to help here.