cassandra
cassandra copied to clipboard
CEP-37 on Trunk
CEP: https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution
Design doc: https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit?tab=t.0#heading=h.r112r46toau0
@masokol , @emolsson , @itskarlsson , @tommystendahl , @etedpet , @jwaeab , @VictorCavichioli , @SajidRiaz138 , @ch1bbe , @ArcturusMengsk , @DanielwEriksson , @manmagic3
as contributors to https://github.com/Ericsson/ecchronos we would very much appreciate any last-minute-pre-CEP-vote review to this PR for Cassandra's new repair solution. (more technical/code review will continue post-CEP vote.)
CEP: https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution
Design doc: https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit?tab=t.0#heading=h.r112r46toau0
Hi,
I think this looks promising! I have a few points:
- Combine ranges instead of splitting - in ecChronos we saw huge improvements in some scenarios (when the data is low or empty tables) compared to repairing 1 vnode at a time. This improvement scaled with amount of vnodes, although it might've been related to overhead due to running repairs through JMX. Not sure but might be worth investigating.
- Major versions, during major version upgrades like 3 -> 4 we weren't supposed to run repairs. If Cassandra plans to keep this then it would be nice for repairs to automatically pause during major version upgrades.
- Observabliity, i saw there're metrics but it would also be nice to see repair status with nodetool.
- Repair priority per table, not per node.
Thanks, @masokol, for the review! Please find my response here:
- Combine ranges instead of splitting - in ecChronos we saw huge improvements in some scenarios (when the data is low or empty tables) compared to repairing 1 vnode at a time. This improvement scaled with amount of vnodes, although it might've been related to overhead due to running repairs through JMX. Not sure but might be worth investigating.
- Generally, empty tables or tables with a small amount of data runs through pretty fast, in seconds, so it is not a major issue for smaller/empty tables.
- The current framework already has support to combine ranges through a setting (src/java/org/apache/cassandra/repair/autorepair/AutoRepair.java:266)
- Major versions, during major version upgrades like 3 -> 4 we weren't supposed to run repairs. If Cassandra plans to keep this then it would be nice for repairs to automatically pause during major version upgrades.
This is a great suggestion, and the framework can be enhanced easily - I just filed a new sub-ticket CASSANDRA-20013 to track this
- Observabliity, i saw there're metrics but it would also be nice to see repair status with nodetool.
There is already a new nodetool command that would print the current status. Please take a look at it here.
- Repair priority per table, not per node.
Currently, it will repair the tables randomly, but it can be enhanced to add a priority as a CQL table property that an end user can configure, which can also be enhanced easily. Just added this enhancement to the ticket mentioned above.
There are a few nits that I have added, but overall it looks good now, I am approving this PR from my side 👍
I a bit of time reviewing documentation and configuration today and have created CASSANDRA-20421 and https://github.com/jaydeepkumar1984/cassandra/pull/47 to improve the docs, configuration and to make the default configuration more conservative.
Rebase on top of the latest trunk has been completed; all the unit tests are passing on CircleCI
Many dtests are failing on CircleCI due to resource limitations. For that, I have run them internally inside our infrastructure and most of them are passing as expected.