cassandra_range_repair icon indicating copy to clipboard operation
cassandra_range_repair copied to clipboard

Multi Datacenter Repair

Open shaurya10000 opened this issue 3 years ago • 3 comments

Hi

https://github.com/BrianGallew/cassandra_range_repair#multiple-datacenters mentions-

"If you have multiple datacenters in your ring, then you MUST specify the name of the datacenter containing the node you are repairing as part of the command-line options (--datacenter=DCNAME). Failure to do so will result in only a subset of your data being repaired (approximately data/number-of-datacenters). This is because nodetool has no way to determine the relevant DC on its own, which in turn means it will use the tokens from every ring member in every datacenter."

So, if we are running repair on every node in the multi DC Cassandra cluster then do we need to specify --datacenter=name on every node or are we good without specifying the datacenter ?

shaurya10000 avatar Mar 02 '21 15:03 shaurya10000

Nope. Each node, in each DC, will end up doing a subset of what it owns rather than all of what it owns.

On Tue, Mar 2, 2021 at 8:33 AM shaurya10000 [email protected] wrote:

Hi

https://github.com/BrianGallew/cassandra_range_repair#multiple-datacenters mentions-

"If you have multiple datacenters in your ring, then you MUST specify the name of the datacenter containing the node you are repairing as part of the command-line options (--datacenter=DCNAME). Failure to do so will result in only a subset of your data being repaired (approximately data/number-of-datacenters). This is because nodetool has no way to determine the relevant DC on its own, which in turn means it will use the tokens from every ring member in every datacenter."

So, if we are running repair on every node in the multi DC Cassandra cluster then do we need to specify --datacenter=name on every node or are we good without specifying the datacenter ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BrianGallew/cassandra_range_repair/issues/62, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC5MS4ZESYH2QYAQQO3BDDTBUAMRANCNFSM4YPGK6HQ .

BrianGallew avatar Mar 03 '21 02:03 BrianGallew

Hi Brian Thanks for your reply. It's 'No' for providing the data center option or for not providing it ? Sorry for being too picky, on basis of your answer I'll have to inform the team for correct parameter.

Although I went through the code and looks like it is must to specify the data center else code sorts and creates wrong token ranges.

shaurya10000 avatar Mar 03 '21 05:03 shaurya10000

To be clear: you MUST specify the DC in any multi-DC deployment just like the docs say.

On Tue, Mar 2, 2021 at 10:45 PM shaurya10000 [email protected] wrote:

Hi Brian Thanks for your reply. It's 'No' for providing the data center option or for not providing it ? Sorry for being too picky, on basis of your answer I'll have to inform the team for correct parameter.

Although I went through the code and looks like it is must to specify the data center else code sorts and creates wrong token ranges.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/BrianGallew/cassandra_range_repair/issues/62#issuecomment-789451726, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC5MSZCILH6WF2DZAL45N3TBXEGHANCNFSM4YPGK6HQ .

BrianGallew avatar Mar 03 '21 15:03 BrianGallew