Inconsistent store pointers between the region-cache and store-cache cause stale regions to become inaccessible.
I believe this is a bug related to inconsistent state between region-cache and store-cache when a TiKV store updates its address or labels.
https://github.com/tikv/client-go/blob/01758810e8419b784c0b652ad32ef03664df50bd/internal/locate/store_cache.go#L494-L521
From the above code, when we update the address or lable of a TiKV instance, a new store will be created and replace the old one in store-cache, we can confirm this by the log
store address or labels changed, add new store and mark old store deleted...
However, since we do not replace the new store in region-cache, for region with its leader from region-cache on this tikv, the status will never change and keeps unavailable
https://github.com/tikv/client-go/blob/01758810e8419b784c0b652ad32ef03664df50bd/internal/locate/region_request.go#L804-L811
When accessing new regions that were not previously cached, the new store point is used and the leader may became available
We do have a related issue https://github.com/tikv/client-go/issues/1401 , and a related fix https://github.com/tikv/client-go/pull/1402, However, it only stop the health check for the old store object, which still not replace the store-pointer in region-cache.
Here is my question, why we do not reuse the old store object directly instead of create a new one?
Workaround: restart the TiDB instance
Here is my question, why we do not reuse the old store object directly instead of create a new one? https://github.com/tikv/client-go/blob/01758810e8419b784c0b652ad32ef03664df50bd/internal/locate/store_cache.go#L494-L521