GradientGuru
GradientGuru
> like we had for BF16Optimizer. does it mean deepspeed already supports automatically changing world size for Zero-1,2,3 if I use "bf16": { "enabled": true } ?
> @stas00, I believe zero stage 1 should be supported: #2284. I think stage 2 might be working or pretty to close. I do plan to work on this, so...
> No, it's not. It's currently a confusing situation as `BF16Optimizer` was written specifically for Megatron-Deepspeed when we trained BLOOM-176B, so it works only in that framework. Is it possible...
> But as I have just explained the ZeRO case is much simpler than TP/DP/PP so it should be relatively easy to make it work with just ZeRO files. I...
> But as I have just explained the ZeRO case is much simpler than TP/DP/PP so it should be relatively easy to make it work with just ZeRO files. I...
update npm to newest version will solve the problem
I have the same problem. in the log it seems has been exploring and getting resources (not just planning, it already got the resource it planed to get) the following...
readme里有提及
baichuan-7B是基座模型,建议尝试微调或者few-shot形式使用。类似问题可加入微信群讨论
给一下推理代码看看