pd icon indicating copy to clipboard operation
pd copied to clipboard

After performing an online recovery, "halt-scheduling" has been set to true when reloading pd

Open mayjiang0203 opened this issue 1 year ago • 4 comments

Bug Report

What did you do?

What did you expect to see?

Should be set to false.

What did you see instead?

[2024/04/18 16:16:08.515 +08:00] [INFO] [cluster.go:1093] ["will run cmd"] [cmd:="tiup ctl:v8.1.0-pre pd -u http://pd3-peer.dr-auto-sync-8c12tikv-tps-7567843-1-466:2379 unsafe remove-failed-stores show"]
  {
    "info": "Unsafe recovery Finished",
    "time": "2024-04-18 16:15:42.491",
[2024/04/18 16:16:22.872 +08:00] [INFO] [cmd.go:197] ["Remote command finished"] [cmd="tiup cluster reload tidbcluster -R pd -y"] [exitcode=0] []
[2024/04/18 16:16:24.293 +08:00] [INFO] [pdutil.go:512] ["run pd ctl command"] [pdCmd="tiup ctl:v8.1.0-pre pd -u http://pd3-peer.dr-auto-sync-8c12tikv-tps-7567843-1-466:2379 config show all"]

What version of PD are you using (pd-server -V)?

v8.1.0

[2024/04/18 15:15:22.453 +08:00] [INFO] [workloadnode.run] [util.go:255] ["/tiup/deploy/pd-/bin/pd-server -V"] [workload=pd2] [2024/04/18 15:15:22.455 +08:00] [INFO] [cmd.go:150] ["Start remote command"] [cmd="/tiup/deploy/pd-/bin/pd-server -V"] [nodename=pd2] 2024-04-18T15:15:22.455+0800 INFO k8s/client.go:223 it should be noted that a long-running command will not be interrupted even the use case has ended. For more information, please refer to https://github.com/pingcap/test-infra/discussions/129 Release Version: v8.1.0^M Edition: Community^M Git Commit Hash: 3ec92bdc67cb6cea9de832e369dabfe7a2a5fa59^M Git Branch: HEAD^M UTC Build Time: 2024-04-15 03:59:49^M

mayjiang0203 avatar Apr 18 '24 08:04 mayjiang0203

/severity major /label affects-8.1 /label affects-7.1 /label affects-7.5 /remove-label may-affects-7.5 /remove-label may-affects-7.1 /remove-label may-affects-6.5 /remove-label may-affects-6.1 /remove-label may-affects-5.4

mayjiang0203 avatar Apr 18 '24 08:04 mayjiang0203

@mayjiang0203: These labels are not set on the issue: affects-7.5, affects-7.1, affects-6.5, affects-6.1, affects-5.4.

In response to this:

/severity major /label affects-8.1 /remove-label affects-7.5 /remove-label affects-7.1 /remove-label affects-6.5 /remove-label affects-6.1 /remove-label affects-5.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot[bot] avatar Apr 19 '24 01:04 ti-chi-bot[bot]

@mayjiang0203: These labels are not set on the issue: may-affects-7.5, may-affects-7.1, may-affects-6.5, may-affects-6.1, may-affects-5.4.

In response to this:

/severity major /label affects-8.1 /label affects-7.1 /label affects-7.5 /remove-label may-affects-7.5 /remove-label may-affects-7.1 /remove-label may-affects-6.5 /remove-label may-affects-6.1 /remove-label may-affects-5.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot[bot] avatar Apr 23 '24 03:04 ti-chi-bot[bot]

The impact of this bug: Reloading the cluster will become very slow because evicting the leader is not working anymore, and restarting TiKV requires waiting for a 10-minute timeout. w/a is: reload pd first, then do "config set halt-scheduling false", after that can reload the cluster.

mayjiang0203 avatar Apr 28 '24 02:04 mayjiang0203

/found customer

seiya-annie avatar Jun 11 '24 10:06 seiya-annie