After performing an online recovery, "halt-scheduling" has been set to true when reloading pd
Bug Report
What did you do?
What did you expect to see?
Should be set to false.
What did you see instead?
[2024/04/18 16:16:08.515 +08:00] [INFO] [cluster.go:1093] ["will run cmd"] [cmd:="tiup ctl:v8.1.0-pre pd -u http://pd3-peer.dr-auto-sync-8c12tikv-tps-7567843-1-466:2379 unsafe remove-failed-stores show"]
{
"info": "Unsafe recovery Finished",
"time": "2024-04-18 16:15:42.491",
[2024/04/18 16:16:22.872 +08:00] [INFO] [cmd.go:197] ["Remote command finished"] [cmd="tiup cluster reload tidbcluster -R pd -y"] [exitcode=0] []
[2024/04/18 16:16:24.293 +08:00] [INFO] [pdutil.go:512] ["run pd ctl command"] [pdCmd="tiup ctl:v8.1.0-pre pd -u http://pd3-peer.dr-auto-sync-8c12tikv-tps-7567843-1-466:2379 config show all"]
What version of PD are you using (pd-server -V)?
v8.1.0
[2024/04/18 15:15:22.453 +08:00] [INFO] [workloadnode.run] [util.go:255] ["/tiup/deploy/pd-/bin/pd-server -V"] [workload=pd2] [2024/04/18 15:15:22.455 +08:00] [INFO] [cmd.go:150] ["Start remote command"] [cmd="/tiup/deploy/pd-/bin/pd-server -V"] [nodename=pd2] 2024-04-18T15:15:22.455+0800 INFO k8s/client.go:223 it should be noted that a long-running command will not be interrupted even the use case has ended. For more information, please refer to https://github.com/pingcap/test-infra/discussions/129 Release Version: v8.1.0^M Edition: Community^M Git Commit Hash: 3ec92bdc67cb6cea9de832e369dabfe7a2a5fa59^M Git Branch: HEAD^M UTC Build Time: 2024-04-15 03:59:49^M
/severity major /label affects-8.1 /label affects-7.1 /label affects-7.5 /remove-label may-affects-7.5 /remove-label may-affects-7.1 /remove-label may-affects-6.5 /remove-label may-affects-6.1 /remove-label may-affects-5.4
@mayjiang0203: These labels are not set on the issue: affects-7.5, affects-7.1, affects-6.5, affects-6.1, affects-5.4.
In response to this:
/severity major /label affects-8.1 /remove-label affects-7.5 /remove-label affects-7.1 /remove-label affects-6.5 /remove-label affects-6.1 /remove-label affects-5.4
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.
@mayjiang0203: These labels are not set on the issue: may-affects-7.5, may-affects-7.1, may-affects-6.5, may-affects-6.1, may-affects-5.4.
In response to this:
/severity major /label affects-8.1 /label affects-7.1 /label affects-7.5 /remove-label may-affects-7.5 /remove-label may-affects-7.1 /remove-label may-affects-6.5 /remove-label may-affects-6.1 /remove-label may-affects-5.4
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.
The impact of this bug: Reloading the cluster will become very slow because evicting the leader is not working anymore, and restarting TiKV requires waiting for a 10-minute timeout. w/a is: reload pd first, then do "config set halt-scheduling false", after that can reload the cluster.
/found customer