singa can sparse all-reduce keep efficiency with large number of gpu workers？

can sparse all-reduce keep efficiency with large number of gpu workers？

Open Eiji911 opened this issue 11 months ago • 0 comments

in my opinion, when the gpu cluster scaled up to several hundred workers, high sparsification ratios still generate significant communication overheads, which even worst than DenseAllReduce.

Mar 07 '24 12:03 Eiji911

singa singa copied to clipboard

can sparse all-reduce keep efficiency with large number of gpu workers？

singa
singa copied to clipboard