ucx
ucx copied to clipboard
UCT/IB/MLX5: Add multi_path config to enable Adaptive Routing
What
Add adaptive routing support for rc_x
and dc
transports on RoCE by the use of multi_path
and multi_path_force
parameters.
Why ?
Need to implement UCX_IB_AR_ENABLE=auto/no/yes
for RoCE.
How ?
Use PRM to add:
- HCA Cap/Cap2 multi_path force/rc/rcx/dc handling
- force option allows overriding nvconfig/sm..
- skip setting multipath parameters upon QP creation
- on RoCE: dc: set both for DCT context and DC QP INIT2RTR transition as specified in PRM
- on RoCE: rc_x: set multipath parameters on QP INIT2RTR transition as specified in PRM
Tested
On RoCE cluster tested UCX_IB_AR_ENABLE=auto/no/yes for rc_x and dc. Could not confirm actual throughput improvement.