Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard
Ongoing research training transformer language models at scale, including: BERT & GPT-2
to make it easy for new users to use our setup
We now have a script that convert megatron-deepspeed checkpoints to HF-transformers checkpoints. Project is [here](https://github.com/bigscience-workshop/Megatron-DeepSpeed/tree/main/tools/convert_checkpoint) and the script is [here](https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/main/tools/convert_checkpoint/deepspeed_to_transformers.py). However, the script doesn't have unit tests that confirm that...
I just started to look at how to adapt [zero_to_fp32.py](https://github.com/microsoft/DeepSpeed/blob/51a2e916b730cf676c66532b19d973a603377cb0/deepspeed/utils/zero_to_fp32.py) to extract fp32 weights from optimizer states. I will park this for now since it was said today fp16 weights...
This extends ``tools/preprocess_dataset_dist.py`` to handle JSONL files as an input dataset. It defines a new IndexedJSON class in ``tools/indexed_json.py`` that creates and uses an index for the JSONL file. The...
Currently, parameter counts in `utils.get_parameters_in_billions` are inaccurate when PP > 1. Tied variables, in particular embedding layers, exist in several copies in the first and last PP stage, which causes...
since some of us use deepspeed@master for other uses make sure we test against the correct deepspeed branch deepspeed@big-science
trying to sort out how to run only the latest push and cancel the previous one if it's still running.
This is solid enough that I'll go ahead and post a WIP PR. It's based on https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/60, so this will look noisy until that PR is merged. Most of the...
Replicate all the tensorboard logging in Meg-DS, plus logging hyperparams of choice. So on the code level: 1. repeat tensorboard 1:1 but log using mlflow api 2. find new places...
Let's discuss which data is used in the test suite. And after the discussion turn into guidelines for test writers. Here is a very rough start: * We want to...