fastmoe icon indicating copy to clipboard operation
fastmoe copied to clipboard

pytest error

Open R-QinQ opened this issue 1 year ago • 3 comments

I find out the moe is 0, but i don't know why image

R-QinQ avatar Dec 29 '23 08:12 R-QinQ

Which test is this error produced by?

laekov avatar Dec 29 '23 08:12 laekov

这个错误是由哪个测试产生的?

Produced by testing the test_fmoe_linear_distributed() function in the test_ddp.py and all of the test parameters is error image

R-QinQ avatar Dec 29 '23 08:12 R-QinQ

I am not able to reproduce this issue. Maybe you need to verify that the nccl version of your pytorch matches the nccl version that you use to compile FastMoE. You can get PyTorch's NCCL version by print(torch.cuda.nccl.version()) .

laekov avatar Dec 29 '23 09:12 laekov