wlike

Results 3 issues of wlike

请问 glm-10b-chinese 模型,在预训练阶段使用了多少 token?谢谢

When the sequence parallelism is enabled along with the tensor parallelism in the training stage with Megatron, there will be multiple copies of parameters of RMSNorm or LayerNorm, and they...

![image](https://cloud.githubusercontent.com/assets/6084282/4338043/350c952a-4015-11e4-8ea8-1972714eed47.png)