Yizhou Wang

Results 24 comments of Yizhou Wang

@loadams Hi, could you please help me trigger the CI? My CLA was reviewed and passed today. Thank you!

@tjruwase Hi, We found a bug in DeepSpeed that when enabling tensor parallel = 2 on Megatron-DeepSpeed 20B 4nodes, would meet below error: _RuntimeError: Global rank 0 is not part...

> @YizhouZ, thanks for this PR. Apologies for the delay as we resolve some CI issues. We plan to merge soon. @tjruwase Thanks!

> @YizhouZ, do you know why this is not a problem for zero stage 1 or 2? Hi @tjruwase only stage 3 would trigger this post_init_method, others would not go...

> @YizhouZ, thanks for confirmation. That makes sense since TP>1 is not very well tested with ZeRO stage 3. This certainly shows a gap in our unit tests. > >...

@tjruwase Could you please help me trigger the CI? My CLA was reviewed and passed today. Thank you!

@tjruwase Fixed CI failed case. Please help to check it. Thank you!

Hi @tjruwase, it seems the current CI failure is not triggered by my changes, I see the previous check is passed but the latest one is failed and the difference...

Thanks for triggering CI. Do you have comments on this PR? @loadams @tjruwase

> Thanks for triggering CI. Do you have comments on this PR? @loadams @tjruwase Hi @loadams @tjruwase, this PR seems not in the merge queue. Could you give us some...