valkey icon indicating copy to clipboard operation
valkey copied to clipboard

Remove the restriction that cli --cluster create requires at least 3 primary nodes

Open enjoy-binbin opened this issue 1 year ago • 12 comments

There is no limitation in Valkey to create a cluster with 1 or 2 primaries, only that it cannot do automatic failover. Remove this restriction and add are you sure prompt to prompt the user.

This allow we use it to create a test cluster by cli or by create-cluster.

enjoy-binbin avatar Sep 26 '24 08:09 enjoy-binbin

It was approved by @zuiderkwast back in https://github.com/redis/redis/pull/13051

enjoy-binbin avatar Sep 26 '24 08:09 enjoy-binbin

Example:

Starting 30001
Starting 30002
Requested to create a cluster with 1 primaries and 1 replicas per primary.
Valkey cluster requires at least 3 primary nodes for automatic failover. Are you sure? (type 'yes' to accept): yes
>>> Performing hash slots allocation on 2 node(s)...
Primary[0] -> Slots 0 - 16383
Adding replica 127.0.0.1:30002 to 127.0.0.1:30001
>>> Trying to optimize replicas allocation for anti-affinity
[WARNING] Some replicas are in the same host as their primary
M: 133dd0fc63b5ffb3c8ae7098f1b5dccf973389a8 127.0.0.1:30001
   slots:[0-16383] (16384 slots) master
S: 1813d9ec27372cf1404a41e25270b82f52b7e47f 127.0.0.1:30002
   replicates 133dd0fc63b5ffb3c8ae7098f1b5dccf973389a8
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join

>>> Performing Cluster Check (using node 127.0.0.1:30001)
M: 133dd0fc63b5ffb3c8ae7098f1b5dccf973389a8 127.0.0.1:30001
   slots:[0-16383] (16384 slots) master
   1 additional replica(s)
S: 1813d9ec27372cf1404a41e25270b82f52b7e47f 127.0.0.1:30002
   slots: (0 slots) slave
   replicates 133dd0fc63b5ffb3c8ae7098f1b5dccf973389a8
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

enjoy-binbin avatar Sep 26 '24 08:09 enjoy-binbin

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 70.63%. Comparing base (bf8183d) to head (bb76f15). Report is 47 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1075      +/-   ##
============================================
+ Coverage     70.61%   70.63%   +0.01%     
============================================
  Files           114      114              
  Lines         61695    61695              
============================================
+ Hits          43568    43578      +10     
+ Misses        18127    18117      -10     
Files with missing lines Coverage Δ
src/valkey-cli.c 55.50% <100.00%> (+0.05%) :arrow_up:

... and 11 files with indirect coverage changes

codecov[bot] avatar Sep 26 '24 09:09 codecov[bot]

@valkey-io/core-team not sure if this is a major decision, but please take a look when you guys has time.

enjoy-binbin avatar Sep 27 '24 02:09 enjoy-binbin

Now can we create a cluster with 1 or 2 primaries in production?

hwware avatar Sep 27 '24 08:09 hwware

Now can we create a cluster with 1 or 2 primaries in production?

yes, we can. You can write you own script or just start a node and then alloc all slots.

enjoy-binbin avatar Sep 27 '24 08:09 enjoy-binbin

Now can we create a cluster with 1 or 2 primaries in production?

Yes it was always possible, but valkey-cli refused to do it.

I believe it is also possible to create a cluster with 3 primaries and later migrate all the slots to just 2 or 1 primary.

zuiderkwast avatar Sep 27 '24 08:09 zuiderkwast

Now can we create a cluster with 1 or 2 primaries in production?

yes, we can. You can write you own script or just start a node and then alloc all slots.

Then how to do the failover automatically? If it can not automatically failover, does it make sense in reality?

hwware avatar Sep 27 '24 13:09 hwware

Then how to do the failover automatically? If it can not automatically failover, does it make sense in reality?

@hwware It doesn't. Please see the diff. It displays a confirm prompt like this where you have to select yes/no for < 3 primaries:

        if (!confirmWithYes("Valkey cluster requires at least 3 primary nodes for "
                            "automatic failover. Are you sure?",
                            ignore_force))

zuiderkwast avatar Sep 27 '24 18:09 zuiderkwast

does it make sense in reality?

Yes, at least it makes sense for development, testing, etc.

Anyway, the cluster itself allows it but not the cli, so I think we can allow it in the cli but there is a warning to make sure the users understand it. Are you convinced?

zuiderkwast avatar Sep 30 '24 08:09 zuiderkwast

does it make sense in reality?

Yes, at least it makes sense for development, testing, etc.

Anyway, the cluster itself allows it but not the cli, so I think we can allow it in the cli but there is a warning to make sure the users understand it. Are you convinced?

Honestly said, I am not convinced. If this feature can not be used in production, why we need develop and test it. At least from my personal view, I am not Inclined to this.

hwware avatar Oct 03 '24 09:10 hwware

Cluster without automatic failover can be used in production. It is already done.

Example: In Ericsson, there is some deployment of a small mobile network including all servers and radio equipment in a truck, which can be used to set up a mobile network in a war zone or disaster zone. This one has the same components as a normal mobile network deployment, including valkey cluster, but it is minimal and without redundancy, so it is a cluster with only one node.

We have standardized all valkey deployments to use cluster and cluster clients, so even single nodes use it. It makes it easy to scale to more notes when we need it.

It is also possible to achieve redundancy in another way: Have a complete redundancy of the whole system including application servers and databases.

I think there are more use cases, like if you want sharding but don't care about high availability, like cache data that you can afford to lose. You maybe want two primaries and no replicas in a cluster.

It is already possible to do this, but you can't use valkey-cli, so users have write their own script to do this. I think it's better that valkey-cli can allow it, with a warning and "are you sure".

zuiderkwast avatar Oct 03 '24 09:10 zuiderkwast

Ok, with the grea explanation by zuiderkwast above, i don't think this will be a major decision, and now I often use it to quickly create clusters for testing, and valkey-benchmark already support it (see #266), so i am going to merge this one, let me know if you think otherwise.

enjoy-binbin avatar Oct 17 '24 05:10 enjoy-binbin