ucx icon indicating copy to clipboard operation
ucx copied to clipboard

UCT/IB/MLX5: Add multi_path config to enable Adaptive Routing

Open tvegas1 opened this issue 7 months ago • 3 comments

What

Add adaptive routing support for rc_x and dc transports on RoCE by the use of multi_path and multi_path_force parameters.

Why ?

Need to implement UCX_IB_AR_ENABLE=auto/no/yes for RoCE.

How ?

Use PRM to add:

  • HCA Cap/Cap2 multi_path force/rc/rcx/dc handling
    • force option allows overriding nvconfig/sm..
  • skip setting multipath parameters upon QP creation
  • on RoCE: dc: set both for DCT context and DC QP INIT2RTR transition as specified in PRM
  • on RoCE: rc_x: set multipath parameters on QP INIT2RTR transition as specified in PRM

Tested

On RoCE cluster tested UCX_IB_AR_ENABLE=auto/no/yes for rc_x and dc. Could not confirm actual throughput improvement.

tvegas1 avatar Jul 15 '24 12:07 tvegas1