client-rust icon indicating copy to clipboard operation
client-rust copied to clipboard

Fix stale region cache with no leader

Open yongman opened this issue 1 year ago • 2 comments

When we create client during tikv-server startup and the region has no leader been elected yet, the region cache in client may be stale with no leader.

It will cause the region access return no leader error until the region id_ver changed.

yongman avatar Mar 11 '24 03:03 yongman

It seems that if there is still no leader when read through PD server, we would all the same get the no leader error.

How about try to handle this situation uniformly by handle_region_error ? Then this error can be retried, as well as backoff to avoid cause too much press to PD servers.

(It's likely that some related codes need to be changed too as this error is raised at apply_shard. Maybe we can try to pass the region_store to single_shard_handler and handle the condition of no leader there.)

pingyu avatar Mar 11 '24 07:03 pingyu

@pingyu Thanks for your advise. It's not enough just handling the NotLeader error in single_shard_handler. In Shardable::shards, store_stream_for_keys, store_stream_for_range, store_stream_for_ranges and resolve_locks will also raise this error.

This seems to require lots of modifications, which could take a lot of time and introduce more risks. Moreover, the logic of the application should have the ability to retry and backoff during handling this error, so just refresh the region cache seems reasonable.

yongman avatar Mar 12 '24 01:03 yongman