Kanghao Chen
Kanghao Chen
The url of pretrained model is broken? Could you please provide me a url to download the model? Thank you !
I meet the same error. How do you fix it?
> > > 大概卡了多久呢,这套代码没有对MoE优化过,所以训练确实会比较慢,30B MoE的速度大概和38B差不多,不一定是卡住了 > > > > > > [@Weiyun1025](https://github.com/Weiyun1025) 30min后,nccl timeout > > [rank3]:[E902 12:24:58.761850774 ProcessGroupNCCL.cpp:632] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=77419, OpType=_ALLGATHER_BASE, NumelIn=98304, NumelOut=1572864,...
Hello, I have the same request. Did you get the raw dataset?