Wenwen Qu

Results 5 issues of Wenwen Qu

I notice the L2 norm for experts is reduced twice in model parallel group, please see: https://github.com/laekov/fastmoe/blob/cd8372b3a8a5e73d46d2b463ec30995631cfc181/examples/megatron/clip-grad-v2.2.patch#L44C2-L44C2. It is a good ideas to add up the square gradients of all...

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand...

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand...

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand...

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand...