Megatron-DeepSpeed icon indicating copy to clipboard operation
Megatron-DeepSpeed copied to clipboard

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Results 124 Megatron-DeepSpeed issues
Sort by recently updated
recently updated
newest added

Hi, I want to continue training of the Bloom model. To start simple, I want to load the 1.1B model into the BigScience Megatron-DeepSpeed library. I tried to run pretrain_gpt.py...

When I run this [https://github.com/bigscience-workshop/bigscience/blob/master/train/tr11-176B-ml/smaller_models/tr11f-6B3-ml-continuation.slurm](https://github.com/bigscience-workshop/bigscience/blob/master/train/tr11-176B-ml/smaller_models/tr11f-6B3-ml-continuation.slurm) script to continue training the model ,I got a strange grad norm and huge loss after auto reduced loss-scale of overflow .Just like I am...

This PR is a step towards generalizing the universal checkpointing approach that enables arbitrary reshapes of 3D parallel checkpoints. This PR eliminated the hardcoding of BLOOM model architecture in the...

Hey, Thank you for the scripts for loading checkpoints and running benchmarks. I have a strange issue that ds_inference fp16 throughput is quite slower than the results mentioned. But, the...

Hi folks! Did anyone encounter [this error](https://github.com/NVIDIA/apex/issues/1458) when installing Apex on Windows? Trying to create a YouTube video covering this codebase but this one is blocking me. I know this...

This is still too hacky to be merged 😁 cc @TevenLeScao Edit: Will make it less hacky on the other branch so merging this

In training, I used swiglu, TP=4, PP=2. I use deepspeed_to_deepspeed.py to convert the checkpoint to a TP=1, PP=1 one. When evaluating the obtained checkpoint, it is found that the accuracy...

Hi, What's the process in finetuning BLOOM? Did anyone succeed and willing to share the code? Thanks!

Hi, Thanks for the great work. I was able to do Inference on BLOOM 7.1 model on 24 GB GPU memory. Can we train the BLOOM models using tensor-Parallelism and...

Bumps [black](https://github.com/psf/black) from 21.4b0 to 24.3.0. Release notes Sourced from black's releases. 24.3.0 Highlights This release is a milestone: it fixes Black's first CVE security vulnerability. If you run Black...

dependencies