[GRAPH] Adding support for rail-optimized trees for MI3XX with 4 NICs
Details
Adding rail-optimized tree support for MI3XX configurations with only 4 NICs per node
Work item: "Internal", or link to GitHub issue (if applicable).
What were the changes?
Added a rail-optimized tree config to model_87 which is the one detected for MI3XX with 4NICs per node.
Why were the changes made?
This can potentially reduce some extra traffic beyond the first layer of NIC switches
How was the outcome achieved?
The work is simply a slight adjustment from the MI3XX 8NIC rail-optimized tree configuration.
Additional Documentation:
Validation was done through topology explorer and RCCL_OUTPUT_TREES output:
Attached at images of the default trees built (RCCL_DISABLE_RAIL_TREES=1) vs after the change for a 4 node configuration.
It can be seen that in the second example, NIC transfers no longer jump rails (change color).
Approval Checklist
Do not approve until these items are satisfied.
- [ ] Verify the CHANGELOG has been updated, if
- there are any NCCL API version changes,
- any changes impact library users, and/or
- any changes impact any other ROCm library.