ACCL issues

Interface with non-FPGA hosts

1

To have FPGAs talk to non-FPGA hosts we need a software implementation of the ACCL collectives protocol on top of TCP initially, then on top of RDMA.

quetric

enhancement

Use PLRAM as exchange memory

2

Currently we have a small exchange memory mapped both in the Microblaze and host address spaces to hold configuration data. The size of this memory is limiting, e.g. some users...

quetric

enhancement

Add support for hierarchical collectives within the confines of fanin == 1. Some examples: - hierarchical rings for (all)gather, (all)reduce, scatter-reduce - hierarchical trees for broadcast and scatter

quetric

enhancement

Broadcast hangs on cyt_rdma

8

I observed similar behaviour with other collectives, but thus far only reproduced it with broadcast, so the title may be misleading. I will add comments of similar behaviour with other...

lawirz

bug

Gather wrong order

9

This issue concerns the branch to resolve issue 196: https://github.com/Xilinx/ACCL/tree/196-reduceallreduce-issues-on-cyt_rdma Gather sometimes switches up the output of the first rank and the second rank on two-node setups, when run on...

lawirz

bug

ACCL
ACCL copied to clipboard

Metadata

Interface with non-FPGA hosts

Use PLRAM as exchange memory

Hierarchical Collectives

Broadcast hangs on cyt_rdma

Gather wrong order

Accl cyt v2

Pytorch ddp

CYT V2 integration without card memory

CYT V2 integration

Configurable eager buffers

← Metadata

Owner

Metadata

ACCL ACCL copied to clipboard

Metadata

← Metadata

Owner

Metadata

ACCL
ACCL copied to clipboard