cockroach
cockroach copied to clipboard
kv: make snapshot apply concurrency limit configurable
Please refer to each individual commits.
Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR.
Before a member of our team reviews your PR, I have some potential action items for you:
- Please ensure your git commit message contains a release note.
- When CI has completed, please ensure no errors have appeared.
I was unable to automatically find a reviewer. You can try CCing one of the following members:
- A person you worked with closely on this PR.
- The person who created the ticket, or a CRDB organization member involved with the ticket (author, commenter, etc.).
- Join our community slack channel and ask on #contributors.
- Try find someone else from here.
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.
Thank you for updating your pull request.
My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.
I was unable to automatically find a reviewer. You can try CCing one of the following members:
- A person you worked with closely on this PR.
- The person who created the ticket, or a CRDB organization member involved with the ticket (author, commenter, etc.).
- Join our community slack channel and ask on #contributors.
- Try find someone else from here.
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.
Thank you for updating your pull request.
My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.
I have added a few people who may be able to assist in reviewing:
- @nvanbenschoten (author of cockroachdb/cockroach#122236)
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.
Thank you for updating your pull request.
My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.
Thank you for updating your pull request.
Before a member of our team reviews your PR, I have some potential action items for you:
- We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with
--force
- When CI has completed, please ensure no errors have appeared.
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.
@lyang24 I'm thinking that there may actually be some benefit to making the snapshotSendQueue
's concurrency limit configurable as well, due to delegated snapshots. Each node's replicateQueue
will still only initiate a single rebalance operation at a time, but a larger snapshotSendQueue
limit will allow nodes in the same locality as a new/recovering node to delegate for multiple remote leaseholders at the same time.
For example, consider the following topology:
region1: {n1, n2, n3, n4, n5}
region2: {n6, n7, n8, n9, n10}
region3: {n11, n12, n13, n14, n15}
rf=3
leaseholder_preferences=none
If we add a new node 16 to region3, all 15 original nodes will have some snapshots that they want to send over to node 16. We would like these to all be delegated by the nodes in region 3. So to maximize upreplication speed, we would want to configure the following cluster settings like:
SET CLUSTER SETTING kv.store.raft_snapshot.apply_limit = 15;
SET CLUSTER SETTING kv.store.raft_snapshot.send_limit = 3;
What do you think about adding a third commit with that new cluster setting? It would look almost identical to the second commit.
was about to achieve great snapshot throughput with adjusted snapshot concurrency on a 9-9-2 cluster (replication 3-1-1) by docker pause on a node on chi and turn it back on the next morning.
SET CLUSTER SETTING kv.store.raft_snapshot.apply_limit = 15;
SET CLUSTER SETTING kv.store.raft_snapshot.send_limit = 10;
However, when i looked at the snapshot generated it seems like the number is well below the new concurrency limit, I wonder if that is a disk read bottleneck or could it be i missed some logic that is still limiting on the sender side?
cc @nvanbenschoten @andrewbaptist
re-reading nathan's comment maybe the low snapshot generation rate was due to the delegated snapshot, let me try pause a node in ash1, and set lease holder preferences to none
I also looked at the failed tests it seems like cluster setting override method will not trigger a set on change to update concurrency limit?
@lyang24 while you're working on this, just want to make sure you saw the review comments in https://github.com/cockroachdb/cockroach/pull/122762#pullrequestreview-2063963564 as well.
new testing results with delegated snapshot and lease preference disabled
one node received up to 1.5 gb of snapshot data per second, the 7hr paused node was up to speed in 6 mintues
@lyang24 how did the recipient of the 1.5GB/s of snapshots fare? Was it overloaded? Did its LSM retain its shape or did we see read amplification?
@lyang24 how did the recipient of the 1.5GB/s of snapshots fare? Was it overloaded? Did its LSM retain its shape or did we see read amplification? I don't see lsm overload from metrics.
Just chatted with Roblox colleagues this is quite helpful on our cockroachdb migration process which migrated data to 3 copies in ASH and then changes the leaseholder to distribute data between ASH OH AND CHI (this process used to take days on large tables). I wonder how far back are we able to backport this pr?
spoke with @andrewbaptist I will address the follow ups in later prs.
bors r+
Build succeeded:
Encountered an error creating backports. Some common things that can go wrong:
- The backport branch might have already existed.
- There was a merge conflict.
- The backport branch contained merge commits.
You might need to create your backport manually using the backport tool.
error creating merge commit from 103a81fe07c8363286cb1c0d544798b84ac1da26 to blathers/backport-release-23.1-122762: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []
you may need to manually resolve merge conflicts with the backport tool.
Backport to branch 23.1.x failed. See errors above.
error creating merge commit from 103a81fe07c8363286cb1c0d544798b84ac1da26 to blathers/backport-release-23.2-122762: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []
you may need to manually resolve merge conflicts with the backport tool.
Backport to branch 23.2.x failed. See errors above.
error setting reviewers, but backport branch blathers/backport-release-24.1-122762 is ready: POST https://api.github.com/repos/cockroachdb/cockroach/pulls/124997/requested_reviewers: 422 Reviews may only be requested from collaborators. One or more of the teams you specified is not a collaborator of the cockroachdb/cockroach repository. []
Backport to branch 24.1.x failed. See errors above.
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.