Megatron-LM add hoper llama golden with mcore calling stack

Add Hoper llama2 7b mcore gold example

Aug 08 '24 11:08 yiakwy-xpu-ml-framework-team

--use-legacy-models - why this option is passed ?

Aug 29 '24 13:08 puneeshkhanna

--use-legacy-models - why this option is passed ?

The latest updates use m-core models by default. For llama2 benchmark test, no need to switch to m-core model and new dataset API:

check here

Sep 02 '24 07:09 yiakwy-xpu-ml-framework-team

when i use convert shell script in your commit, It shows "Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/workspace/models/Llama-2-7b-hf/tokenizer.model'. Use repo_type argument if needed." error,
Do you know how to use local tokenizer.model file? Thank you.

Sep 29 '24 02:09 carolove

and I found, Using "TOKENIZER_MODEL=meta-llama/Llama-2-7b-hf" in shell script can convert hf to megatron successfully.

Sep 29 '24 02:09 carolove

and I found, Using "TOKENIZER_MODEL=meta-llama/Llama-2-7b-hf" in shell script can convert hf to megatron successfully.

Hi @carlove /workspace/models is the standard location where I have models in the docker, you can either create soft links of the location to point the real model, if it hosted in your distributed file system.

HF_MODEL_DIR=/workspace/models/$MODEL
OUTPUT=/workspace/models/$MODEL-to-megatron-tp$TP-pp$PP
TOKENIZER_MODEL=/workspace/models/$MODEL/tokenizer.model

For tput test, you don't need to download the dataset and model parameters, otherwise, you should run convert script first to create 3D parallel (classical llama2) checkpoint and load the weights and optimizer states depending on your task type,

The usage of Llama2 tokenizer relies on the tokenizer class you choose. For latest megatron (> 2403), I recommend to use meta-llama2 tokenrizer. For old megatron (< 2310), I recommend use HuggingFaceLlama2Tokenizer .

Here is the difference:

// megatron > 2403 uses sentencepiece proto API to load Meta 32K BPE vocabulary
class _Llama2Tokenizer(_SentencePieceTokenizer):
    """SentencePieceTokenizer-Megatron wrapper"""

    def __init__(self, model_file,):
        super().__init__(model_file, vocab_extra_ids=0)

    def _initalize(self, vocab_extra_ids):
        self._populate_vocab()

        # BOS / EOS token IDs
        self.n_words: int = self.tokenizer.vocab_size()
        self.bos_id: int = self.tokenizer.bos_id()
        self.eos_id: int = self.tokenizer.eos_id()
        self.pad_id: int = self.tokenizer.pad_id()
        assert self.tokenizer.vocab_size() == self.tokenizer.get_piece_size()
        ...

which uses sentencepiece to load model:

        import sentencepiece
        self.tokenizer = sentencepiece.SentencePieceProcessor(model_file=model_file)

That means the megatron after 2403 is built for meta-llama.

Sep 30 '24 06:09 yiakwy-xpu-ml-framework-team

Marking as stale. No activity in 60 days.

Nov 29 '24 18:11 github-actions[bot]

@yiakwy-xpu-ml-framework-team thank you for the contribution! We have now added a llama3 example here: https://github.com/NVIDIA/Megatron-LM/tree/main/examples/llama

Jul 25 '25 17:07 sbhavani

Megatron-LM Megatron-LM copied to clipboard

add hoper llama golden with mcore calling stack

Megatron-LM
Megatron-LM copied to clipboard