Laladong

Results 1 issues of Laladong

in 3.3.1 All-to-all based implementation. ' Once a GPU receives gradients from its predecessor, we dequantize it to recover full precision and conduct a local reduction. ' I would like...

documentation