ozone
ozone copied to clipboard
HDDS-10715. Remove Decommision nodes on replication
What changes were proposed in this pull request?
For containers with insufficient replicas, new target will be chose for the replication.
For decommissioned nodes, although it will be excluded at last, but it will waste the "retry count", by default it's 3, which will cause containers can not be replicated when a cluster has many nodes in decommission state.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-10715
How was this patch tested?
Existing tests.
My first thought on this, is that if the placement policy already excludes decommissioning nodes via the isValidateNode
method at the end, could we simply add all decommissioning nodes (and we probably need to include manintenance nodes too) to the exclude list at the start in the placement policy. That way it would fix it for all callers in one place.
Eg we already have this as the entry point in SCMCommonPlacementPolicy and it has access to NodeManager I think, so we could add to the exclude list right at the start:
@Override
public final List<DatanodeDetails> chooseDatanodes(
List<DatanodeDetails> usedNodes,
List<DatanodeDetails> excludedNodes,
List<DatanodeDetails> favoredNodes,
int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
throws SCMException {
/*
This method calls the chooseDatanodeInternal after fixing
the excludeList to get the DatanodeDetails from the node manager.
When the object of the Class DataNodeDetails is built from protobuf
only UUID of the datanode is added which is used for the hashcode.
Thus not passing any information about the topology. While excluding
datanodes the object is built from protobuf @Link {ExcludeList.java}.
NetworkTopology removes all nodes from the list which does not fall under
the scope while selecting a random node. Default scope value is
"/default-rack/" which won't match the required scope. Thus passing the proper
object of DatanodeDetails(with Topology Information) while trying to get the
random node from NetworkTopology should fix this. Check HDDS-7015
*/
return chooseDatanodesInternal(validateDatanodes(usedNodes),
validateDatanodes(excludedNodes), favoredNodes, nodesRequired,
metadataSizeRequired, dataSizeRequired);
}
@sodonnel IMHO, expanding the excludedNodes list within the implementation of the chooseDatanodes method may indeed deviate from the original intent of the interface. Modifying these parameters in the implementation could confuse users of the interface, as they generally do not expect the parameters they pass to be altered.
But it is wrong for the placement policy to return a decommissioning node. It does indeed filter out any decommission nodes it finds and retries. So it is "removing them" but in a sub-optimal way.
The caller should not need to know all illegal nodes it needs to pass into the placement policy. What if there is another illegal node type in the future? Then we have to modified all the callers, rather than a single place.
You are probably correct that it is not good to modify the passed parameter list, but we can copy it into a new list that is used inside the placement policy and add to that copy.
I think there are other places in the code where the placement policies are called too. Eg pipeline creation, which could run into the same sort of problems I think. If we fix it inside the placement policy then it covers all existing scenarios.
I think there are other places in the code where the placement policies are called too. Eg pipeline creation, which could run into the same sort of problems I think. If we fix it inside the placement policy then it covers all existing scenarios.
@sodonnel Agreed. Updated the PR, PTAL.
@sodonnel PTAL.