docs
docs copied to clipboard
Explain knobs for tuning load-based rebalancing
Andrew Woods (awoods187) commented:
From a recent customer conversation, we identified a need to tune load-based rebalancing. We have a number of tracking issues for this including the main tracking issue:
- https://github.com/cockroachdb/cockroach/issues/34752
We have identified various other secondary issues like:
- https://github.com/cockroachdb/cockroach/issues/34747
- https://github.com/cockroachdb/cockroach/issues/34590
- https://github.com/cockroachdb/cockroach/issues/34591
We should document the knobs we do have now and the impact they can have depending upon the setting.
- [ ] GOMAXPROCS=24 environment variable set when starting each node.
- [ ] Increase the snapshot rate limits via
set cluster setting kv.snapshot_rebalance_max_rate='8MiB'
- [ ]
set cluster setting kv.snapshot_recovery.max_rate='32MiB'
These can allow users to support more load before encountering any degradation.
Jira Issue: DOC-245
cc @bdarnell for any additional thoughts
These are emergency settings for a particular customer's idiosyncratic setup; we don't want to recommend these across the board (at least the GOMAXPROCS one). And we're changing the snapshot rate defaults in 19.1 so we may not need any work there.
tl;dr: Can we close this issue?
@johnrk (or Andy or Ben), does this docs issue still need to exist in the v20.1+ world? I notice that 3/4 of the CRDB issues are still open, but I don't know what that means for this docs issue (if anything).
Ben, based on what you said, it seems like things got "better" with snapshot rates in the time since this issue was filed, so maybe docs for this are no longer needed?
I notice that 3/4 of the CRDB issues are still open
I just closed another one, but the linked issues are really about possible product improvements/issues; they don't really have any bearing on what we can or should document right now. (A more relevant issue for docs is https://github.com/cockroachdb/cockroach/issues/39200. We want to combine the recovery and rebalance settings, so whenever we give advice about one we should do the same for the other)
For the specific docs suggestions:
- We don't want to document GOMAXPROCS=24, at least not for rebalancing reasons (but we should recommend it in k8s contexts: #5922)
- The rebalance rate now defaults to 8 MiB, so nothing to change there.
- It's still sometimes useful to increase these parameters, so we should document when and why. And whenever we change one, we should change the other (our TPC-C docs currently recommend only changing the "rebalance" setting, while our known issues page suggests only changing the "recovery" one).
It's still sometimes useful to increase these parameters, so we should document when and why. And whenever we change one, we should change the other (our TPC-C docs currently recommend only changing the "rebalance" setting, while our known issues page suggests only changing the "recovery" one).
OK thanks Ben. Keeping this one on the TODO list then
We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB docs!