ozone
ozone copied to clipboard
HDDS-10789. Check policy based on original replicas instead of eligibleReplicas
What changes were proposed in this pull request?
In RatisOverReplicationHandler, the eligibleReplicas might be trimmed due to uniqueness check, and the check of placement policy should be done on the original replicas set, instead of the trimmed eligibleReplicas.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-10789
How was this patch tested?
Original tests.
I am not sure on this. @siddhantsangwan might remember more clearly, as I think he worked on it.
However, eligibleReplicas is the set of replias with pendingDelete and out of service removed. Ie, those replicas are going to go away "soon" so we should not consider them for over replication / placement or removal. So the set is trimmed down to that. Then we retain certain unhealthy replicas by also removing them from the set.
At the end of the trim, we have a set of replicas that are in the system and can be removed. If we use the original set for the placement check, then there are potentially replicas that are going to be removed in it, and so the check is not consistent with the future view.
By using the eligible replicas in the placement check, we are using all the healthy and not going to be deleted replicas, which seems like the correct thing to do.
@sodonnel Thanks for the review.
The case we met the issue is as follows:
- Four replicas in QUASI_CLOSE state from the single source.
- OriginalSet will be 4 replicas, eligibleSet has 3 replicas, one removed since for uniqueness.
- Then in comparing the policy validity, it will compare between 3 replicas and 2 replicas, this is not correct.
For inprogress moves, this should be removed here.
I think this change looks good in concept. If the comments from @ivandika3 are addressed it should be good to commit.
A unit test would be nice too, but I know it can be a little tricky to setup mis-replication scenarios in the unit tests.