wangyao

Results 1 comments of wangyao

> Q: Does each rank need to maintain the same gate output? Each rank's inputs are "local batch of data", and their gating output will be also their "local gating...