bluefog icon indicating copy to clipboard operation
bluefog copied to clipboard

Proposal for local GPU communication merging

Open BichengYing opened this issue 5 years ago • 1 comments

Current win_ops logics is

win_create -> gradient/iterate update -> win_put -> win_sync

The processing between all nodes/agents are almost decoupled and independent.

We want to further optimize our communication for multi machines cases. We know the communication between multiple GPUs within in same physical machine should be faster than communication between different machines. Further, we can utilize the NCCL, RDMA, etc technique to accelerate the speed. I suggest modifying the processes into

Local machine leader: win_create -> gradient/iterate update -> Local Allreduce -> win_put -> win_sync ocal machine worker1:
nothing ----> gradient/iterate update -> Local Allreduce ---- nothing local machine woker2:
nothing ----> gradient/iterate update -> Local Allreduce ---- nothing

BichengYing avatar Apr 13 '20 01:04 BichengYing

the neighbor_allreduce version is done with machine id based

BichengYing avatar Oct 26 '20 00:10 BichengYing