RATIS-1968. Remove unreached else block of trySendDelayed
What changes were proposed in this pull request?
In "https://github.com/apache/ratis/blob/9bd82aa2aa125a749b9357f208dd66533f43374c/ratis-common/src/main/java/org/apache/ratis/util/SlidingWindow.java#L363", the else block should not be reached, since only the first request is send out, all other request are still in the "delayedRequests", we can not receive out-of-order replies if "firstReplied" is not set.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/RATIS-1968
Please replace this section with the link to the Apache JIRA)
How was this patch tested?
No tests.
Since testWithLoadAsync fails, it probably needs the else. Sorry that the code was long time ago. I might not remember the details. It may be related to failover to another server (i.e. leader change).
Yes, seems the variables are reset on exceptions. I will try to dive deeper, if needed, I'll close the PR then.
@symious , I recall now -- when there is a failover, there could possibly be multiple calls needed to be retried. Since the retires are async, some other calls can be retried before the first call.
In short, client calls are always in order but retries may be out of order.
@szetszwo The error seems caused by repeating resets on the slidingWindow, could you help to check?
The error seems caused by repeating resets on the slidingWindow, ...
Where is the code?
repeating reset.
https://github.com/apache/ratis/blob/cedcd2ad4cd2da13230aaa0d15678e5aee9b0729/ratis-client/src/main/java/org/apache/ratis/client/impl/OrderedAsync.java#L257C61-L257C61
Steps:
- Client send 100 async requests to server, here firstReplied is true
- Leader changed, 100 requests replied with exceptions, trying to reset the slidingWindow.
- Say the requests id are [100, 199], client receives "id100" and and reset the slidingWindow, firstReplied changed to false.
- Client retry on "id100" and receives the reply, so set firstReplied to true.
- Client then receives other requests' exception reply from Step2, then reset the slidingWindow again, but the connection is ok now, no need to reset again.
@szetszwo
The failAllAsyncRequests may fail with a different exception (AlreadyClosedException may work, or we should add a new AlreadyFailedException). For that exception, do not resetSlidingWindow.
@szetszwo I think failAllAsyncRequests is not invoked here, the slidingWindow is reset, then CompletionException(e) is thrown and failAllAsyncRequests is skipped. (Correct me if I'm wrong.)