pd icon indicating copy to clipboard operation
pd copied to clipboard

v8.5.1: scale out 3 tikvs, one tikv became stuck during region-balance

Open mayjiang0203 opened this issue 9 months ago • 1 comments

Bug Report

What did you do?

Scale out 3 tikvs. Two of them work fine, and many regions have been balanced to them, but one tikv region-balance is stuck.

Image

Image Enable debug logging; there are some issues with the score calculation.

Image

What did you expect to see?

What did you see instead?

What version of PD are you using (pd-server -V)?

v8.5.1

mayjiang0203 avatar Mar 14 '25 04:03 mayjiang0203

/assign @nolouch

mayjiang0203 avatar Mar 14 '25 04:03 mayjiang0203

/report customer

seiya-annie avatar Apr 22 '25 07:04 seiya-annie

It may have happened in a big cluster; the average region count is 66k, and the average region size is 200MB

bufferflies avatar Jun 26 '25 07:06 bufferflies

The user can run this command pd-ctl config set tolerant-size-ratio 1 to work around, it will make the reimbursement small, but it bring some extra balanced operator, so the user needs to reset it after rebalance finished

bufferflies avatar Jun 30 '25 02:06 bufferflies