renhouxing comments

Repositories
Issues
Comments

Results 4 comments of


                                            renhouxing

[QUESTION] Why megatron-core seems slower and use more gpu mem than legacy for gpt_pretrain?

A possible reason is that the local mcore model does not support flash-attn. https://github.com/NVIDIA/Megatron-LM/blob/core_v0.6.0/megatron/core/models/gpt/gpt_layer_specs.py#L53

[BUG] grad_norm and loss is nan when deepspeed==0.13.5 but ok with deepspeed==0.10.2

@loadams I also encountered the same problem. More exp: deepspeed==0.12.4, zero-2, multi-node. N (grad_norm always be 1.0, and loss 0) deepspeed==0.12.4, zero-2, one-node. Y deepspeed==0.12.4, zero-3, multi-node. Y deepspeed==0.12.4, zero-3,...

"lm_head.weight" not in the parameters of Starcoder2-3B and Starcoder2-7B (Huggingface version)

ok, thanks for your response!

🤗 [REQUEST] - ReflectionCoder

Hi @ganler, it's been a week since I submitted the request. I've attached the eval script and the raw outputs. Is there anything else I can do to speed up...