ozone HDDS-10715. Remove Decommision nodes on replication

What changes were proposed in this pull request?

For containers with insufficient replicas, new target will be chose for the replication.

For decommissioned nodes, although it will be excluded at last, but it will waste the "retry count", by default it's 3, which will cause containers can not be replicated when a cluster has many nodes in decommission state.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10715

How was this patch tested?

Existing tests.

Apr 19 '24 08:04 symious

My first thought on this, is that if the placement policy already excludes decommissioning nodes via the isValidateNode method at the end, could we simply add all decommissioning nodes (and we probably need to include manintenance nodes too) to the exclude list at the start in the placement policy. That way it would fix it for all callers in one place.

Eg we already have this as the entry point in SCMCommonPlacementPolicy and it has access to NodeManager I think, so we could add to the exclude list right at the start:

  @Override
  public final List<DatanodeDetails> chooseDatanodes(
          List<DatanodeDetails> usedNodes,
          List<DatanodeDetails> excludedNodes,
          List<DatanodeDetails> favoredNodes,
          int nodesRequired, long metadataSizeRequired, long dataSizeRequired)
          throws SCMException {
/*
  This method calls the chooseDatanodeInternal after fixing
  the excludeList to get the DatanodeDetails from the node manager.
  When the object of the Class DataNodeDetails is built from protobuf
  only UUID of the datanode is added which is used for the hashcode.
  Thus not passing any information about the topology. While excluding
  datanodes the object is built from protobuf @Link {ExcludeList.java}.
  NetworkTopology removes all nodes from the list which does not fall under
  the scope while selecting a random node. Default scope value is
  "/default-rack/" which won't match the required scope. Thus passing the proper
  object of DatanodeDetails(with Topology Information) while trying to get the
  random node from NetworkTopology should fix this. Check HDDS-7015
 */
    return chooseDatanodesInternal(validateDatanodes(usedNodes),
            validateDatanodes(excludedNodes), favoredNodes, nodesRequired,
            metadataSizeRequired, dataSizeRequired);
  }

Apr 19 '24 08:04 sodonnel

@sodonnel IMHO, expanding the excludedNodes list within the implementation of the chooseDatanodes method may indeed deviate from the original intent of the interface. Modifying these parameters in the implementation could confuse users of the interface, as they generally do not expect the parameters they pass to be altered.

Apr 19 '24 16:04 symious

But it is wrong for the placement policy to return a decommissioning node. It does indeed filter out any decommission nodes it finds and retries. So it is "removing them" but in a sub-optimal way.

The caller should not need to know all illegal nodes it needs to pass into the placement policy. What if there is another illegal node type in the future? Then we have to modified all the callers, rather than a single place.

You are probably correct that it is not good to modify the passed parameter list, but we can copy it into a new list that is used inside the placement policy and add to that copy.

Apr 19 '24 16:04 sodonnel

I think there are other places in the code where the placement policies are called too. Eg pipeline creation, which could run into the same sort of problems I think. If we fix it inside the placement policy then it covers all existing scenarios.

Apr 19 '24 16:04 sodonnel

I think there are other places in the code where the placement policies are called too. Eg pipeline creation, which could run into the same sort of problems I think. If we fix it inside the placement policy then it covers all existing scenarios.

@sodonnel Agreed. Updated the PR, PTAL.

Apr 23 '24 08:04 symious

@sodonnel PTAL.

Apr 29 '24 08:04 symious

ozone ozone copied to clipboard

HDDS-10715. Remove Decommision nodes on replication

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

ozone
ozone copied to clipboard