tiup
tiup copied to clipboard
tiup fails to reload the cluster when one TiKV node fails permanently
Bug Report
Please answer these questions before submitting your issue. Thanks!
- What did you do?
A TiKV node fails for bugs, I build a new one and try to patch it using command:
% tiup cluster patch test-cluster tikv-server.tar.gz -R tikv --transfer-timeout 1
- What did you expect to see?
tiup can patch the cluster successfully.
- What did you see instead?
It fails with following errors:
failed counting leader on 127.0.0.122:20160 (status addr http://127.0.0.122:20180/metrics), executing GET request for URL "http://127.0.0.122:20180/metrics" failed: Get http://127.0.0.122:20180/metrics: dial tcp 127.0.0.122:20180: connect: connection refused
Error: failed to evict store leader 127.0.0.122: metric tikv_raftstore_region_count{type="leader"} not found
127.0.0.122 is the store that fails permanently for bugs.
- What version of TiUP are you using (
tiup --version)?
v1.3.1 tiup
Go Version: go1.13
Git Branch: release-1.3
GitHash: d51bd0c
Related to #661, I have met not just one problem about evicting leaders mechanism when it's totally unnecessary. A flag to disable it completely will be a life saver.
Maybe we can add a --force flag for tiup cluster patch, to ignore the error in test env.
Encounter this again with one TiKV keep panicking and can't reload with a patched binary.