Megatron-DeepSpeed icon indicating copy to clipboard operation
Megatron-DeepSpeed copied to clipboard

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Results 124 Megatron-DeepSpeed issues
Sort by recently updated
recently updated
newest added

Are there any other layer norm functions, such as RMSNorm or DeepNorm

Specially, I am looking a script with Deepspeed PP and ZeRO-DP like this [https://github.com/bigscience-workshop/Megatron-DeepSpeed/tree/bitfit#deepspeed-pp-and-zero-dp](url) In my understanding, this script should be able to load bloom with some change, for example...

# Problem On pretaining GPT like models using this [script](https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/7b5f175b73a12d602cdadd3ee205ed666f6d4234/pretrain_gpt.py) , with **_*tensor_parallelsim(TP)=1 and pipeline_paralleism(PP)>1 for model of any size and batch size*_** I get the following user warning multiple...