wangyao
Results
1
comments of
wangyao
> Q: Does each rank need to maintain the same gate output? Each rank's inputs are "local batch of data", and their gating output will be also their "local gating...