Megatron-LM
Megatron-LM copied to clipboard
add hoper llama golden with mcore calling stack
Add Hoper llama2 7b mcore gold example
--use-legacy-models - why this option is passed ?
--use-legacy-models - why this option is passed ?
The latest updates use m-core models by default. For llama2 benchmark test, no need to switch to m-core model and new dataset API:
when i use convert shell script in your commit,
It shows "Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/workspace/models/Llama-2-7b-hf/tokenizer.model'. Use repo_type argument if needed." error,
Do you know how to use local tokenizer.model file?
Thank you.
and I found, Using "TOKENIZER_MODEL=meta-llama/Llama-2-7b-hf" in shell script can convert hf to megatron successfully.
and I found, Using "TOKENIZER_MODEL=meta-llama/Llama-2-7b-hf" in shell script can convert hf to megatron successfully.
Hi @carlove /workspace/models is the standard location where I have models in the docker, you can either create soft links of the location to point the real model, if it hosted in your distributed file system.
HF_MODEL_DIR=/workspace/models/$MODEL
OUTPUT=/workspace/models/$MODEL-to-megatron-tp$TP-pp$PP
TOKENIZER_MODEL=/workspace/models/$MODEL/tokenizer.model
For tput test, you don't need to download the dataset and model parameters, otherwise, you should run convert script first to create 3D parallel (classical llama2) checkpoint and load the weights and optimizer states depending on your task type,
The usage of Llama2 tokenizer relies on the tokenizer class you choose. For latest megatron (> 2403), I recommend to use meta-llama2 tokenrizer. For old megatron (< 2310), I recommend use HuggingFaceLlama2Tokenizer .
Here is the difference:
// megatron > 2403 uses sentencepiece proto API to load Meta 32K BPE vocabulary
class _Llama2Tokenizer(_SentencePieceTokenizer):
"""SentencePieceTokenizer-Megatron wrapper"""
def __init__(self, model_file,):
super().__init__(model_file, vocab_extra_ids=0)
def _initalize(self, vocab_extra_ids):
self._populate_vocab()
# BOS / EOS token IDs
self.n_words: int = self.tokenizer.vocab_size()
self.bos_id: int = self.tokenizer.bos_id()
self.eos_id: int = self.tokenizer.eos_id()
self.pad_id: int = self.tokenizer.pad_id()
assert self.tokenizer.vocab_size() == self.tokenizer.get_piece_size()
...
which uses sentencepiece to load model:
import sentencepiece
self.tokenizer = sentencepiece.SentencePieceProcessor(model_file=model_file)
That means the megatron after 2403 is built for meta-llama.
Marking as stale. No activity in 60 days.
@yiakwy-xpu-ml-framework-team thank you for the contribution! We have now added a llama3 example here: https://github.com/NVIDIA/Megatron-LM/tree/main/examples/llama