Megatron-DeepSpeed icon indicating copy to clipboard operation
Megatron-DeepSpeed copied to clipboard

how to convert huggingface model to megatron-deepspeed?

Open yayaQAQ opened this issue 1 year ago • 7 comments

as title said.

yayaQAQ avatar Aug 12 '22 09:08 yayaQAQ

This is not possible. To download DS checkpoints refer to this issue: https://github.com/bigscience-workshop/Megatron-DeepSpeed/issues/319

mayank31398 avatar Aug 12 '22 14:08 mayank31398

This is not possible. To download DS checkpoints refer to this issue: #319

why? So I have to training from the ground up? It's hard.

yayaQAQ avatar Aug 13 '22 03:08 yayaQAQ

I don't understand the issue. Do you just need to run inference? If that is the case, DS-inference is compatible with all Huggingface models.

mayank31398 avatar Aug 13 '22 08:08 mayank31398

I don't understand the issue. Do you just need to run inference? If that is the case, DS-inference is compatible with all Huggingface models.

Hello, I have the same problem.

I want to load the model on Huggingface as a pre-training model weight and continue the training using the Megatron Deepspeed framework.

But I found that I didn't know how to convert the weight of Huggingface into the weight of Megatron Deepspeed.

I look forward to your help. Thank you.

AnShengqiang avatar Aug 18 '22 10:08 AnShengqiang

By the way:

model structure: gpt model link: https://huggingface.co/TsinghuaAI/CPM-Generate

I want to train the model with 4 pipeline parallel and deepspeed.

AnShengqiang avatar Aug 18 '22 11:08 AnShengqiang

@AnShengqiang Its non-trivial to convert models for training. People are actively exploring this as far as I know. This repository saves something called a universal checkpoint which can be converted to other checkpoints. However, I am quite new here so, I don't really know how that works.

mayank31398 avatar Aug 19 '22 00:08 mayank31398

@AnShengqiang Its non-trivial to convert models for training. People are actively exploring this as far as I know. This repository saves something called a universal checkpoint which can be converted to other checkpoints. However, I am quite new here so, I don't really know how that works.

Thank you for your reply, I will go to find the answer, if there is good news, I will put it here.

AnShengqiang avatar Aug 19 '22 01:08 AnShengqiang