ompi icon indicating copy to clipboard operation
ompi copied to clipboard

lower coll accelerator priority

Open bosilca opened this issue 1 year ago • 0 comments

The accelerator collective module (which allocates and moves the data onto the host in order to complete collective communications) has a priority higher than some collective modules that do natively support CUDA/ROCM (such as UCC). This leads the terrible performance for most users, for as long as they don't manually exclude the accelerator collective (via --mca coll ^accelerator).

This is definitively not very user-friendly, we need to find a way to prevent the accelerator framework from staying in the way of collective components that handle accelerator buffers.

bosilca avatar Jun 20 '24 16:06 bosilca