GradientGuru comments

Results 17 comments of


                                            GradientGuru

[REQUEST] universal checkpoint for ZeRO - 1,2,3

> like we had for BF16Optimizer. does it mean deepspeed already supports automatically changing world size for Zero-1,2,3 if I use "bf16": { "enabled": true } ?

[REQUEST] universal checkpoint for ZeRO - 1,2,3

> @stas00, I believe zero stage 1 should be supported: #2284. I think stage 2 might be working or pretty to close. I do plan to work on this, so...

[REQUEST] universal checkpoint for ZeRO - 1,2,3

> No, it's not. It's currently a confusing situation as `BF16Optimizer` was written specifically for Megatron-Deepspeed when we trained BLOOM-176B, so it works only in that framework. Is it possible...

[REQUEST] universal checkpoint for ZeRO - 1,2,3

> But as I have just explained the ZeRO case is much simpler than TP/DP/PP so it should be relatively easy to make it work with just ZeRO files. I...

[REQUEST] universal checkpoint for ZeRO - 1,2,3

> But as I have just explained the ZeRO case is much simpler than TP/DP/PP so it should be relatively easy to make it work with just ZeRO files. I...

npx tsc throw error : npm ERR! cb.apply is not a function

update npm to newest version will solve the problem

Cannot get the bot to move

I have the same problem. in the log it seems has been exploring and getting resources (not just planning, it already got the resource it planed to get) the following...

[Question] 请教下数据集的处理是不是类似于nemo megatron那一套呢？

readme里有提及

请问如果想做翻译或实体提取，提取结果以JSON格式输出，问句应该怎么写？

baichuan-7B是基座模型，建议尝试微调或者few-shot形式使用。类似问题可加入微信群讨论

配置文件里面是do_sample是true，但是为什么没有随机性

给一下推理代码看看