GradientGuru

Results 17 comments of GradientGuru

> like we had for BF16Optimizer. does it mean deepspeed already supports automatically changing world size for Zero-1,2,3 if I use "bf16": { "enabled": true } ?

> @stas00, I believe zero stage 1 should be supported: #2284. I think stage 2 might be working or pretty to close. I do plan to work on this, so...

> No, it's not. It's currently a confusing situation as `BF16Optimizer` was written specifically for Megatron-Deepspeed when we trained BLOOM-176B, so it works only in that framework. Is it possible...

> But as I have just explained the ZeRO case is much simpler than TP/DP/PP so it should be relatively easy to make it work with just ZeRO files. I...

> But as I have just explained the ZeRO case is much simpler than TP/DP/PP so it should be relatively easy to make it work with just ZeRO files. I...

update npm to newest version will solve the problem

I have the same problem. in the log it seems has been exploring and getting resources (not just planning, it already got the resource it planed to get) the following...

baichuan-7B是基座模型,建议尝试微调或者few-shot形式使用。类似问题可加入微信群讨论