Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

add megatron_mcore saver/loader

Open TING2938 opened this issue 1 year ago • 12 comments

add saver/loader for mcore checkpoint convert

TING2938 avatar Feb 25 '24 07:02 TING2938

Hi, I think your conversation code has some potential bugs:

https://github.com/NVIDIA/Megatron-LM/issues/703

1049451037 avatar Feb 27 '24 04:02 1049451037

Hi, I think your conversation code has some potential bugs:

#703

I try it with vicuna 7b model, the inference is ok

image

image

you can add some conversation template for inference

TING2938 avatar Feb 27 '24 05:02 TING2938

I changed the model from llama2-7b to llama2-7b-chat. Now it works! Thank you!

1049451037 avatar Feb 27 '24 05:02 1049451037

However, when I try llama-2-70b-chat, it doesn't work anymore... (I run with tensor parallel = 2 and pipeline parallel = 2)

Is there any problem with transformation or Megatron-LM?

The problem is that the numbers are never shown in terminal:


Enter prompt: What is the answer of 1+1=? Enter number of tokens to generate: 100 Megatron Response: [INST] <<SYS>> You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. <</SYS>> What is the answer of 1+1=? [/INST] The answer to the equation is "The answer to the equation is two (or "the result is two").

The calculation works like this because when you add one plus one (or +), the result is two (or =).

Therefore the answer is two."

Is there anything else I can help you with ?

1049451037 avatar Feb 27 '24 08:02 1049451037

I find that there is something wrong with tensor parallel for transformation. Here is the experiments:

  1. TP=4, PP=1 ❌
  2. TP=2, PP=2 ❌
  3. TP=1, PP=4 ✅

So... how to fix it?

1049451037 avatar Feb 27 '24 15:02 1049451037

Finally, I figured it out that I have to set parallel_output=False to make inference work with tensor parallel.

1049451037 avatar Feb 28 '24 02:02 1049451037

Finally, I figured it out that I have to set parallel_output=False to make inference work with tensor parallel.

Thanks for your tests, text_generation_server on main branch does not support mcore inference, maybe we can fix it

https://github.com/NVIDIA/Megatron-LM/blob/ad53b1e38689a0ceed75ade7821f4e6c7554abb4/tools/run_text_generation_server.py#L22

TING2938 avatar Feb 28 '24 04:02 TING2938

I just use this model provider by setting parallel_output=False:

https://github.com/NVIDIA/Megatron-LM/blob/ad53b1e38689a0ceed75ade7821f4e6c7554abb4/pretrain_gpt.py#L54-L66

It works fine in text_generation_server.

1049451037 avatar Feb 28 '24 04:02 1049451037

Moreover, I find that padding vocab is necessary to finetune llama-70b with tensor parallel >= 4.

1049451037 avatar Feb 28 '24 07:02 1049451037

@TING2938 Is there a tool available for converting checkpoints that include optimizer states while changing the tensor parallelism (TP) and pipeline parallelism (PP) configurations? Thank you.

ftgreat avatar Mar 27 '24 02:03 ftgreat

Marking as stale. No activity in 60 days.

github-actions[bot] avatar May 26 '24 18:05 github-actions[bot]

@TING2938 Is there a tool available for converting checkpoints that include optimizer states while changing the tensor parallelism (TP) and pipeline parallelism (PP) configurations? Thank you.

The repo already include tools to produces weights in partition (tp/pp mode), you can include optimization states when doing sft/continue train jobs.

Marking as stale. No activity in 60 days.

github-actions[bot] avatar Nov 08 '24 18:11 github-actions[bot]

@TING2938 Thanks for the contributions! Since this PR was opened, checkpoint conversion tools have evolved significantly. Megatron-LM has the functionality you proposed around HF Llama conversion.

In addition, Megatron-Bridge is the primary location for this functionality and provides comprehensive bidirectional Hugging Face ↔ Megatron conversion.

If there's any functionality not covered by existing checkpoint conversion tools, we'd welcome contributions to Megatron Bridge going forward.

sbhavani avatar Oct 06 '25 02:10 sbhavani