v8.5.1: scale out 3 tikvs, one tikv became stuck during region-balance
Bug Report
What did you do?
Scale out 3 tikvs. Two of them work fine, and many regions have been balanced to them, but one tikv region-balance is stuck.
Enable debug logging; there are some issues with the score calculation.
What did you expect to see?
What did you see instead?
What version of PD are you using (pd-server -V)?
v8.5.1
/assign @nolouch
/report customer
It may have happened in a big cluster; the average region count is 66k, and the average region size is 200MB
The user can run this command pd-ctl config set tolerant-size-ratio 1 to work around, it will make the reimbursement small, but it bring some extra balanced operator, so the user needs to reset it after rebalance finished