client-rust
client-rust copied to clipboard
“Connection refused ” caused by stopping tikv process when the tikv is marked as offline status
we mark a tikv as offline status by sending a pd api for delete the store, its a normal opreation as shrinking tikv nodes. Then, we kill the tikv process to simulate hardware damage, in a client which is start before shrinking operation, a scan api fail with print "gRPC api error: status: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 111)", details: [], metadata: MetadataMap { headers: {} }". But ,in a new start client, a same scan api can return correct result. Other way, when we start the tikv process again, the old client can return the correct result too.
What's the version of TiKV & client-rust ?
Which scan you are using, Txn or Raw ? Please show the codes about how client-rust is used.
What's the version of TiKV & client-rust ?
Which scan you are using, Txn or Raw ? Please show the codes about how client-rust is used.
v8.0.1 tikv and the latest rust client code, we use scan by Raw. we use c++ client bridge rust client, like:
client_tikv = new tikv_client::RawKVClient(pd_vect); auto kv_pairs = client_tikv->scan(start_marker, end_marker, max_to_get + 1, kTimeoutMs);
I found that directly killing a TiKV process can also trigger this phenomenon. I suspect it is related to the leaders on the killed TiKV node — the client may have incorrectly accessed the killed TiKV instance.
I found that directly killing a TiKV process can also trigger this phenomenon. I suspect it is related to the leaders on the killed TiKV node — the client may have incorrectly accessed the killed TiKV instance.
Some start_marker scan can return normally. I suspect that this bug is triggered only when the Region being accessed has its leader located on the TiKV instance that was killed.
It seems that scan_inner is not handling the gRPC error properly. Similar to #419.
It seems that
scan_inneris not handling the gRPC error properly. Similar to #419.
but the scan_inner function will definitely call the single_shard_handler function, and the single_shard_handler function contains a check for is_grpc_error.
@pingyu, i have submitted a PR(https://github.com/tikv/client-rust/pull/495), please review it. By my test,it can resolve this issue.