When upgrad or reload a cluster, increase retry when accessing PD
Bug Report
- What did you do? tiup cluster upgrade <clsuter_name>
In the TiKV evict leader phase : error requesting pd api , response: no leader
- What did you expect to see?
After investigation, it was found that due to the leader priority setting in PD, a leader switch occurred during the "upgrade cluster" pd stage. Subsequently, PD checked the leader priority every minute, causing a PD leader transfer that took 0.5 seconds.
Coincidentally, during this 0.5-second window, the upgrade cluster process had already reached the TiKV stage and was performing the "set leader evict scheduler" operation, resulting in a "no leader" error when accessing PD, which caused TiUP to exit.
I think a retry mechanism should be added when calling the PD API to prevent TiUP upgrade or reload operations from being interrupted due to such short-term changes in PD.
-
What did you see instead? tiup error exits
-
What version of TiUP are you using (
tiup --version)? v1.14.0