Megatron-DeepSpeed icon indicating copy to clipboard operation
Megatron-DeepSpeed copied to clipboard

Question about downloading checkpoints of 6.3B,2.5B,1.3B

Open misska1 opened this issue 2 years ago • 3 comments

Does BigScience also provide the original BLOOM checkpoints (without conversion to Huggingface 🤗). I am working on finetuning BLOOM (6.3B,2.5B,1.3B) and I need those checkpoint files. issues/315

In https://github.com/bigscience-workshop/bigscience/tree/master/train/tr1-13B-base ,I found some urls but they are all offline.


I created 4 repos at https://huggingface.co/bigscience/ and now we can clone those as the dirs data will be output into:

cd $six_ALL_CCFRSCRATCH/checkpoints/tr1-13B
git clone https://huggingface.co/bigscience/tr1-13B-checkpoints checkpoints
git clone https://huggingface.co/bigscience/tr1-13B-tensorboard tensorboard
git clone https://huggingface.co/bigscience/tr1-13B-codecarbon codecarbon
git clone https://huggingface.co/bigscience/tr1-13B-logs logs

misska1 avatar Aug 02 '22 09:08 misska1

You can convert the HF checkpoints back to Megatron-DeepSpeed. See this (a bit hacky) script: https://gist.github.com/malteos/c194368594e16439c101b7bf27195fd1

malteos avatar Aug 04 '22 14:08 malteos

You can convert the HF checkpoints back to Megatron-DeepSpeed. See this (a bit hacky) script: https://gist.github.com/malteos/c194368594e16439c101b7bf27195fd1

@malteos Thank you for your answer! However , in your code ,I need to specify a DeepSpeed checkpoint.

        "checkpoint_dir",
        type=str,
        help="Path to the DeepSpeed checkpoint directory",

But I do not have a DeepSpeed checkpoint for (6.3B,2.5B,1.3B) to compare.

misska1 avatar Aug 05 '22 06:08 misska1

The script updates the weights of Deepspeed checkpoint directly on the disk with the weights from a HF checkpoint. So you just need to save an untrained DS checkpoint and update it afterwards. You can use the existing slurm scripts for that and only set the number of train steps to 1.

malteos avatar Aug 05 '22 06:08 malteos