docs Mention best practice of using multiple `--join` arguments in a cluster

trafficstars

It is usually advisable to give rethinkdb multiple --join arguments, one for each other server in the cluster. That way, if one other server is down when restarting a RethinkDB node, it will still be able to connect to the cluster rather than waiting (possibly forever) for that one server to come up.

A RethinkDB server started with multiple --join options will become available once it's able to connect to at least one of them. If it can't connect to any of the listed servers, it will stay in a partially available state and keep trying to connect. In this state it can respond to incoming cluster connections from other nodes, but will not accept any client connections nor process queries.

http://rethinkdb.com/docs/start-a-server/ might be the best way to put this (under "A RethinkDB cluster using multiple machines").

This came up on our community Slack channel.

Oct 12 '15 18:10 danielmewes

Is there any problem if the nodes join up in a line, instead of all joining the same instance?

For example, say we have 3 servers, A, B, and C.

C comes up first, then B joins C, and finally A joins B. Each server has the same config (--join A --join B --join C), but the join chain of events is dependent on starting timing.

Does A -> B -> C instead of A,B -> C pose any problems?

Jan 13 '16 00:01 karlrasche

@karlrasche if I am understanding your question completely, there should not be any issues. You can start servers serially, each one joining to the server before it (and thus to all other running servers in the cluster), or you can start a singe server, and then join all of the servers to only that one.

Or you can do what @danielmewes was referencing: give each starting server a --join option for each of the servers (running or not) in your cluster. The starting server will then try to connect to all of the given servers, plus any servers it learns about when connecting to any of those.

The only weird situation is if you have two running clusters (even clusters of 1 node) that don't know about each other and you join a new server to both of them. It will wind up introducing the two clusters to each other, grouping them together into one cluster. This can have some odd side-effects, specifically if you happen to have databases or tables with the same names in both of the original clusters.

Jan 15 '16 19:01 larkost

I was wondering more about the case:

`--join`` is specified once for each node in the cluster
much of the cluster comes up concurrently (like after a power outage)

and if there were any potential races that would prevent a fully-connected cluster from forming. It seems pretty low risk, I'll admit.

Jan 15 '16 20:01 karlrasche

We don't currently have a lot of testing in that area, but there are no issues in that area that I am aware of. Thus far I have always seen it pretty robust in that area.

Jan 18 '16 21:01 larkost

I am having huge problems with my cluster. Servers keeps disconnecting for unknown reason (there is nothing in logs). I think I might have cluster setup wrong.

First this is first, I understand sharding and that is great feature but what are:

"n replica per shard"?

What that means?

Second thing. How to configure cluster with "n" servers? I have 6 servers because of sharding (i have few docs with more then 10mil records) but I am not sure that I configured my cluster correctly.

On every server I wrote: for example (srv1.conf) join=srv2:port join=srv3:port join=srv4:port join=srv5:port join=srv6:port

Is this correct way to add server to cluster?

There is nothing in docs and it would be great if you can post some "recomended" cluster configuration.

Third thing is on hardware it self. Do I need to have identical servers?

I really hope that someone could help me with this because when I had just one server my app was working all the time. Now every time that some servers get disconnected everything crashes. I am using nodejs rethinkdbdash.

Sep 20 '16 22:09 goors

@goors: You don't need identical servers for this. As far as I know, that's the proper way to list multiple servers to join in the configuration file -- one of the engineers can chime in here if that's wrong, but the documentation does say join= can be specified multiple times.

Having said that, I don't think that's likely to be the cause of your crashes if they're happening when servers disconnect. I would check the troubleshooting logs for more information, and consider bringing this up in the RethinkDB user Slack or Stack Overflow. If you don't get an answer there, consider opening a bug report in the main rethinkdb repository rather than the docs.

Sep 20 '16 22:09 chipotle

@chipotle I am going crazy. Can you tell me more about this: "n replica per shard"?

What is difference in having 1 or 6 replicas per shard?

What if I should down 5 servers and leave only one? What happens to data since data are sharded on 6 servers?

Sep 20 '16 23:09 goors

docs docs copied to clipboard

Mention best practice of using multiple `--join` arguments in a cluster

docs
docs copied to clipboard