tiup icon indicating copy to clipboard operation
tiup copied to clipboard

tiup fails to reload the cluster when one TiKV node fails permanently

Open BusyJay opened this issue 4 years ago • 2 comments

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?

A TiKV node fails for bugs, I build a new one and try to patch it using command:

% tiup cluster patch test-cluster tikv-server.tar.gz -R tikv --transfer-timeout 1
  1. What did you expect to see?

tiup can patch the cluster successfully.

  1. What did you see instead?

It fails with following errors:

failed counting leader on 127.0.0.122:20160 (status addr http://127.0.0.122:20180/metrics), executing GET request for URL "http://127.0.0.122:20180/metrics" failed: Get http://127.0.0.122:20180/metrics: dial tcp 127.0.0.122:20180: connect: connection refused

Error: failed to evict store leader 127.0.0.122: metric tikv_raftstore_region_count{type="leader"} not found

127.0.0.122 is the store that fails permanently for bugs.

  1. What version of TiUP are you using (tiup --version)?
v1.3.1 tiup
Go Version: go1.13
Git Branch: release-1.3
GitHash: d51bd0c

Related to #661, I have met not just one problem about evicting leaders mechanism when it's totally unnecessary. A flag to disable it completely will be a life saver.

BusyJay avatar Jan 15 '21 12:01 BusyJay

Maybe we can add a --force flag for tiup cluster patch, to ignore the error in test env.

lucklove avatar Jan 20 '21 12:01 lucklove

Encounter this again with one TiKV keep panicking and can't reload with a patched binary.

BusyJay avatar Jul 14 '21 11:07 BusyJay