Kanghao Chen

Results 4 comments of Kanghao Chen

The url of pretrained model is broken? Could you please provide me a url to download the model? Thank you !

I meet the same error. How do you fix it?

> > > 大概卡了多久呢,这套代码没有对MoE优化过,所以训练确实会比较慢,30B MoE的速度大概和38B差不多,不一定是卡住了 > > > > > > [@Weiyun1025](https://github.com/Weiyun1025) 30min后,nccl timeout > > [rank3]:[E902 12:24:58.761850774 ProcessGroupNCCL.cpp:632] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=77419, OpType=_ALLGATHER_BASE, NumelIn=98304, NumelOut=1572864,...

Hello, I have the same request. Did you get the raw dataset?