pd icon indicating copy to clipboard operation
pd copied to clipboard

The TSO request may have a high latency after the leader changes

Open rleungx opened this issue 1 year ago • 2 comments

Enhancement Task

If the TSO request fails, it will try to update the members to get the new leader.

https://github.com/tikv/pd/blob/41ec8dced0d363950a6541141109aaf605a6b499/client/tso_dispatcher.go#L403-L416 https://github.com/tikv/pd/blob/41ec8dced0d363950a6541141109aaf605a6b499/client/tso_dispatcher.go#L436

And there is a backoff, which the minimum time is 100ms https://github.com/tikv/pd/blob/41ec8dced0d363950a6541141109aaf605a6b499/client/pd_service_discovery.go#L532-L556

At the same time, the request can still be put into the channel and wait for handling: https://github.com/tikv/pd/blob/41ec8dced0d363950a6541141109aaf605a6b499/client/tso_client.go#L528

And the request might be affected by the backoff because we need to wait for the stream to be re-established.

rleungx avatar Nov 20 '24 10:11 rleungx

And the request might be affected by the backoff because we need to wait for the stream to be re-established.

I think this should meet expectations, if not waiting for re-connection, should it return an error directly?

AndreMouche avatar Dec 03 '24 18:12 AndreMouche

Dispatching TSO requests and stream reconnection are asynchronized. If we return an error, it also needs to wait until the PD leader is switched and the stream is reconnected. We previously added a backoff to prevent PD from being overwhelmed by GetMember requests.

I think one way to mitigate the issue is to reduce the backoff interval.

rleungx avatar Dec 20 '24 04:12 rleungx