ucc
ucc copied to clipboard
Unified Collective Communication Library
V1.1.x of #616
V1.1.x of #600
## What Adding support for persistent colls to test mpi
minor change to unify header include in ec/rocm/kernel
## What Optimization of CUDA executor copy task ## How ? Manual loop unrolling
## What Unifies pipelining parameters. Adds ucc_pipeline_params_t and the interface for user to set them + cfg var parser. ## Why ? Each time we add another pipelined alg we...
## What Properly handle potential failures that happen during TL context_create_epilog call. ## Why ? Current behavior: if context_create_epilog fails -> ucc context creation fails -> job fails. Expected behavior:...
## What Introduce the ability to use host based reduction and copy operations ## Why? This avoids the cost of a kernel launch., which can be beneficial for short messages...
## What Adds new TL/MLX5: minimal necessary tl iface stubs w/o much actual implementation (added in next PRs). Adds option to provide negate sign "^" to the --with-tls. Default list...
## What Potential Alternative for #596 . This PR implements ALL the reductinos (dt/ops) in the ec/cuda executor for persistent mode. It is done by making "device template" functions (common...