docs icon indicating copy to clipboard operation
docs copied to clipboard

Explain knobs for tuning load-based rebalancing

Open awoods187 opened this issue 6 years ago • 6 comments

Andrew Woods (awoods187) commented:

From a recent customer conversation, we identified a need to tune load-based rebalancing. We have a number of tracking issues for this including the main tracking issue:

  • https://github.com/cockroachdb/cockroach/issues/34752

We have identified various other secondary issues like:

  • https://github.com/cockroachdb/cockroach/issues/34747
  • https://github.com/cockroachdb/cockroach/issues/34590
  • https://github.com/cockroachdb/cockroach/issues/34591

We should document the knobs we do have now and the impact they can have depending upon the setting.

  • [ ] GOMAXPROCS=24 environment variable set when starting each node.
  • [ ] Increase the snapshot rate limits via set cluster setting kv.snapshot_rebalance_max_rate='8MiB'
  • [ ] set cluster setting kv.snapshot_recovery.max_rate='32MiB'

These can allow users to support more load before encountering any degradation.

Jira Issue: DOC-245

awoods187 avatar Feb 21 '19 20:02 awoods187

cc @bdarnell for any additional thoughts

awoods187 avatar Feb 21 '19 20:02 awoods187

These are emergency settings for a particular customer's idiosyncratic setup; we don't want to recommend these across the board (at least the GOMAXPROCS one). And we're changing the snapshot rate defaults in 19.1 so we may not need any work there.

bdarnell avatar Feb 21 '19 20:02 bdarnell

tl;dr: Can we close this issue?

@johnrk (or Andy or Ben), does this docs issue still need to exist in the v20.1+ world? I notice that 3/4 of the CRDB issues are still open, but I don't know what that means for this docs issue (if anything).

Ben, based on what you said, it seems like things got "better" with snapshot rates in the time since this issue was filed, so maybe docs for this are no longer needed?

rmloveland avatar Jul 28 '20 18:07 rmloveland

I notice that 3/4 of the CRDB issues are still open

I just closed another one, but the linked issues are really about possible product improvements/issues; they don't really have any bearing on what we can or should document right now. (A more relevant issue for docs is https://github.com/cockroachdb/cockroach/issues/39200. We want to combine the recovery and rebalance settings, so whenever we give advice about one we should do the same for the other)

For the specific docs suggestions:

  • We don't want to document GOMAXPROCS=24, at least not for rebalancing reasons (but we should recommend it in k8s contexts: #5922)
  • The rebalance rate now defaults to 8 MiB, so nothing to change there.
  • It's still sometimes useful to increase these parameters, so we should document when and why. And whenever we change one, we should change the other (our TPC-C docs currently recommend only changing the "rebalance" setting, while our known issues page suggests only changing the "recovery" one).

bdarnell avatar Jul 28 '20 19:07 bdarnell

It's still sometimes useful to increase these parameters, so we should document when and why. And whenever we change one, we should change the other (our TPC-C docs currently recommend only changing the "rebalance" setting, while our known issues page suggests only changing the "recovery" one).

OK thanks Ben. Keeping this one on the TODO list then

rmloveland avatar Jul 29 '20 14:07 rmloveland

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB docs!

github-actions[bot] avatar Jul 26 '23 11:07 github-actions[bot]