Symious
Symious
> Can you add to the PR's description or as a comment, the new tickets that were raised from this PR? @xBis7 Sure updated in the description.
> can you please rebase, your branch is 400+ commits behind master. Sure, will try to do the rebase lately.
This PR has been split to https://github.com/apache/ozone/pull/4887 and https://github.com/apache/ozone/pull/3874, so I'm closing this one.
Yes, seems the variables are reset on exceptions. I will try to dive deeper, if needed, I'll close the PR then.
@szetszwo The error seems caused by repeating resets on the slidingWindow, could you help to check?
> repeating reset. https://github.com/apache/ratis/blob/cedcd2ad4cd2da13230aaa0d15678e5aee9b0729/ratis-client/src/main/java/org/apache/ratis/client/impl/OrderedAsync.java#L257C61-L257C61 Steps: 1. Client send 100 async requests to server, here firstReplied is true 2. Leader changed, 100 requests replied with exceptions, trying to reset the slidingWindow....
@szetszwo I think `failAllAsyncRequests` is not invoked here, the slidingWindow is reset, then CompletionException(e) is thrown and `failAllAsyncRequests` is skipped. (Correct me if I'm wrong.)
@sodonnel Thanks for the review. The case we met the issue is as follows: 1. Four replicas in QUASI_CLOSE state from the single source. 2. OriginalSet will be 4 replicas,...
@sodonnel IMHO, expanding the excludedNodes list within the implementation of the chooseDatanodes method may indeed deviate from the original intent of the interface. Modifying these parameters in the implementation could...
> I think there are other places in the code where the placement policies are called too. Eg pipeline creation, which could run into the same sort of problems I...