ACCL
ACCL copied to clipboard
Alveo Collective Communication Library: MPI-like communication operations for Xilinx Alveo accelerators
To have FPGAs talk to non-FPGA hosts we need a software implementation of the ACCL collectives protocol on top of TCP initially, then on top of RDMA.
Currently we have a small exchange memory mapped both in the Microblaze and host address spaces to hold configuration data. The size of this memory is limiting, e.g. some users...
Add support for hierarchical collectives within the confines of fanin == 1. Some examples: - hierarchical rings for (all)gather, (all)reduce, scatter-reduce - hierarchical trees for broadcast and scatter
I observed similar behaviour with other collectives, but thus far only reproduced it with broadcast, so the title may be misleading. I will add comments of similar behaviour with other...
This issue concerns the branch to resolve issue 196: https://github.com/Xilinx/ACCL/tree/196-reduceallreduce-issues-on-cyt_rdma Gather sometimes switches up the output of the first rank and the second rank on two-node setups, when run on...
Initial steps to merge ACCL into coyote v2
My work on the Coyote RDMA version and testing on NN
work in progress on enabling cytv2 integration built without memory on fpga
additional work on accl cyt v2 integration Note: To build with gpu support refer to comment in test/host/Coyote/test_gpu.cpp
memory residience of eager rx buffers is now runtime parameter