pd
pd copied to clipboard
scheduling service recover more than 5mins when inject scheduling primary network partition
Enhancement
What did you do?
1、run workload
2、inject network partition between scheduling primary and all other pods
What did you expect to see?
scheduling service can recover less than 5mins when inject scheduling primary network partition
What did you see instead?
scheduling service recover more than 5mins when inject scheduling primary network partition
What version of PD are you using (pd-server -V
)?
./pd-server -V Release Version: v8.0.0-alpha Edition: Community Git Commit Hash: e199866f59e22e3759a8e9459ef33d57f784890d Git Branch: heads/refs/tags/v8.0.0-alpha UTC Build Time: 2024-02-26 11:38:17 2024-02-28T11:55:27.776+0800
/assign rleungx
It relies on hibernate region tick interval because currently, the switch of scheduling primary won't awake all regions. So the prepare checker cannot receive all regions' heartbeat in time.