Megatron-DeepSpeed icon indicating copy to clipboard operation
Megatron-DeepSpeed copied to clipboard

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Results 124 Megatron-DeepSpeed issues
Sort by recently updated
recently updated
newest added

to make it easy for new users to use our setup

enhancement

We now have a script that convert megatron-deepspeed checkpoints to HF-transformers checkpoints. Project is [here](https://github.com/bigscience-workshop/Megatron-DeepSpeed/tree/main/tools/convert_checkpoint) and the script is [here](https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/main/tools/convert_checkpoint/deepspeed_to_transformers.py). However, the script doesn't have unit tests that confirm that...

Good First Issue

I just started to look at how to adapt [zero_to_fp32.py](https://github.com/microsoft/DeepSpeed/blob/51a2e916b730cf676c66532b19d973a603377cb0/deepspeed/utils/zero_to_fp32.py) to extract fp32 weights from optimizer states. I will park this for now since it was said today fp16 weights...

This extends ``tools/preprocess_dataset_dist.py`` to handle JSONL files as an input dataset. It defines a new IndexedJSON class in ``tools/indexed_json.py`` that creates and uses an index for the JSONL file. The...

Currently, parameter counts in `utils.get_parameters_in_billions` are inaccurate when PP > 1. Tied variables, in particular embedding layers, exist in several copies in the first and last PP stage, which causes...

since some of us use deepspeed@master for other uses make sure we test against the correct deepspeed branch deepspeed@big-science

trying to sort out how to run only the latest push and cancel the previous one if it's still running.

This is solid enough that I'll go ahead and post a WIP PR. It's based on https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/60, so this will look noisy until that PR is merged. Most of the...

Replicate all the tensorboard logging in Meg-DS, plus logging hyperparams of choice. So on the code level: 1. repeat tensorboard 1:1 but log using mlflow api 2. find new places...

MLFlow

Let's discuss which data is used in the test suite. And after the discussion turn into guidelines for test writers. Here is a very rough start: * We want to...