Megatron-LM
Megatron-LM copied to clipboard
Megatron-LM for LLaMa3
I'm attempting to train LLaMA-3 using Megatron-LM but have encountered an issue: LLaMA-3 utilizes Tiktoken for tokenization and doesn't provide a tokenizer.model file, which is required by Megatron-LM. How can I adapt or generate a compatible tokenizer.model for Megatron-LM? Any guidance or workaround would be greatly appreciated!
There is a tokenizer.model file in the Hugging Face Checkpoints under the /original folder, check https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/original/tokenizer.model
Tente recomeçar do zero observando com calma os digitado. As vezes as máquinas falha.
also checkout the llama3 example in megatron launcher: https://github.com/NVIDIA/NeMo-Framework-Launcher/blob/main/examples/training/llama/h100/llama3_8b_bf16.sh
There is a tokenizer.model file in the Hugging Face Checkpoints under the /original folder, check https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/original/tokenizer.model
This doesn't work directly because this model file can get loaded from the sentencepiece.
Here's the error:
Traceback (most recent call last):
File "/Users/dsdsdds/Downloads/check_tokenizer_model.py", line 5, in <module>
print(sp.Load("./tokenizer.model"))
File "/Users/dsdsdds/anaconda3/envs/moe/lib/python3.10/site-packages/sentencepiece/__init__.py", line 961, in Load
return self.LoadFromFile(model_file)
File "/Users/dsdsdds/anaconda3/envs/moe/lib/python3.10/site-packages/sentencepiece/__init__.py", line 316, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: could not parse ModelProto from ./tokenizer.model
There is a tokenizer.model file in the Hugging Face Checkpoints under the /original folder, check https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/original/tokenizer.model
This doesn't work directly because this model file can get loaded from the sentencepiece.
Here's the error:
Traceback (most recent call last): File "/Users/dsdsdds/Downloads/check_tokenizer_model.py", line 5, in <module> print(sp.Load("./tokenizer.model")) File "/Users/dsdsdds/anaconda3/envs/moe/lib/python3.10/site-packages/sentencepiece/__init__.py", line 961, in Load return self.LoadFromFile(model_file) File "/Users/dsdsdds/anaconda3/envs/moe/lib/python3.10/site-packages/sentencepiece/__init__.py", line 316, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) RuntimeError: Internal: could not parse ModelProto from ./tokenizer.model
That's true! But bypassing this is pretty easy, just create a new tokenizer like the one of Llama2. You can do self.tokenizer = AutoTokenizer.from_pretrained() and change a bit some methods (For example, def tokenize(...): return self.tokenizer(....)).
True
On Mon, 20 May 2024 at 8:59 PM, Antoni-Joan Solergibert < @.***> wrote:
There is a tokenizer.model file in the Hugging Face Checkpoints under the /original folder, check https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/original/tokenizer.model
This doesn't work directly because this model file can get loaded from the sentencepiece.
Here's the error:
Traceback (most recent call last): File "/Users/dsdsdds/Downloads/check_tokenizer_model.py", line 5, in
print(sp.Load("./tokenizer.model")) File "/Users/dsdsdds/anaconda3/envs/moe/lib/python3.10/site-packages/sentencepiece/init.py", line 961, in Load return self.LoadFromFile(model_file) File "/Users/dsdsdds/anaconda3/envs/moe/lib/python3.10/site-packages/sentencepiece/init.py", line 316, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) RuntimeError: Internal: could not parse ModelProto from ./tokenizer.model That's true! But bypassing this is pretty easy, just create a new tokenizer like the one of Llama2 https://github.com/NVIDIA/Megatron-LM/blob/c3677e09aa4e2eec37048307bd795928b8f8324a/megatron/training/tokenizer/tokenizer.py#L441. You can do self.tokenizer = AutoTokenizer.from_pretrained() and change a bit some methods (For example, def tokenize(...): return self.tokenizer(....)).
— Reply to this email directly, view it on GitHub https://github.com/NVIDIA/Megatron-LM/issues/818#issuecomment-2119999184, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEA4FGUDA4EQ5AJBLRE3U4LZDG3Q5AVCNFSM6AAAAABHQHBIXGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJZHE4TSMJYGQ . You are receiving this because you commented.Message ID: @.***>
真的 … On Mon, 20 May 2024 at 8:59 PM, Antoni-Joan Solergibert < @.> wrote: There is a tokenizer.model file in the Hugging Face Checkpoints under the /original folder, check https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/original/tokenizer.model This doesn't work directly because this model file can get loaded from the sentencepiece. Here's the error: Traceback (most recent call last): File "/Users/dsdsdds/Downloads/check_tokenizer_model.py", line 5, in
print(sp.Load("./tokenizer.model")) File "/Users/dsdsdds/anaconda3/envs/moe/lib/python3.10/site-packages/sentencepiece/init.py", line 961, in Load return self.LoadFromFile(model_file) File "/Users/dsdsdds/anaconda3/envs/moe/lib/python3.10/site-packages/sentencepiece/init.py", line 316, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) RuntimeError: Internal: could not parse ModelProto from ./tokenizer.model That's true! But bypassing this is pretty easy, just create a new tokenizer like the one of Llama2 https://github.com/NVIDIA/Megatron-LM/blob/c3677e09aa4e2eec37048307bd795928b8f8324a/megatron/training/tokenizer/tokenizer.py#L441. You can do self.tokenizer = AutoTokenizer.from_pretrained() and change a bit some methods (For example, def tokenize(...): return self.tokenizer(....)). — Reply to this email directly, view it on GitHub <#818 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEA4FGUDA4EQ5AJBLRE3U4LZDG3Q5AVCNFSM6AAAAABHQHBIXGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJZHE4TSMJYGQ . You are receiving this because you commented.Message ID: @. >
I also encountered the same problem. Could you please share your configuration? Thank you very much
It's really not usable in the latest megatron... for llama3. Hope official team can fix this...
Marking as stale. No activity in 60 days.
Just use the tokenizer below and add case for llama3tokenizer in arguments.py file and in the build_tokenizer function in the tmegatron/training/tokenizer/tokenizer.py. It seems to work this way
def create_llama3_tokenizer(*args, **kwargs):
class _Llama3Tokenizer(MegatronTokenizer):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
from llama.tokenizer import Tokenizer as Llama3Tokenizer
self.tokenizer = Llama3Tokenizer(*args, **kwargs)
def instruct_tokenize(self, s: str, bos=True, eos=False):
'''Default args for text completion, not chat/dialog.'''
assert type(s) is str
t = self.tokenizer.encode(s, bos=bos, eos=eos, allowed_special='all')
return t
def tokenize(self, s: str, bos=True, eos=False):
'''Default args for text completion, not chat/dialog.'''
assert type(s) is str
t = self.tokenizer.encode(s, bos=bos, eos=eos, allowed_special='all')
return t
def detokenize(self, ids):
return self.tokenizer.decode(ids)
@property
def vocab(self):
return self.tokenizer.vocab
@property
def inv_vocab(self):
return self.tokenizer.inv_vocab
@property
def cls(self):
return -1
@property
def sep(self):
return -1
@property
def mask(self):
return -1
@property
def eod(self):
return self.tokenizer.eos_id
@property
def additional_special_tokens_ids(self):
return None
@property
def vocab_size(self):
return self.tokenizer.model.n_vocab
return _Llama3Tokenizer(*args, **kwargs)
Just use the tokenizer below and add case for llama3tokenizer in arguments.py file and in the build_tokenizer function in the tmegatron/training/tokenizer/tokenizer.py. It seems to work this way
@eliird where does from llama.tokenizer import Tokenizer as Llama3Tokenizer come from?
Its the official llama3 tokenizer from this repo https://github.com/meta-llama/llama3/blob/main/llama/tokenizer.py. You can clone the repo and install with pip install -e .
Thank you! It works for me:)
@dongmin-ra can you explain how you made it work?
I followed @eliird steps and added function create_llama3_tokenizer and also pip install -e llama3 dependency.
I also used tokenizer.model from huggingface/Meta-Llama-3-8B but get following error:
Traceback (most recent call last):
File "/workspace/Megatron-LM/pretrain_gpt.py", line 278, in <module>
pretrain(
File "/workspace/Megatron-LM/megatron/training/training.py", line 263, in pretrain
initialize_megatron(
File "/workspace/Megatron-LM/megatron/training/initialize.py", line 73, in initialize_megatron
set_global_variables(args)
File "/workspace/Megatron-LM/megatron/training/global_vars.py", line 93, in set_global_variables
_ = _build_tokenizer(args)
File "/workspace/Megatron-LM/megatron/training/global_vars.py", line 142, in _build_tokenizer
_GLOBAL_TOKENIZER = build_tokenizer(args)
File "/workspace/Megatron-LM/megatron/training/tokenizer/tokenizer.py", line 55, in build_tokenizer
tokenizer = create_llama3_tokenizer(args.tokenizer_model)
File "/workspace/Megatron-LM/megatron/training/tokenizer/tokenizer.py", line 665, in create_llama3_tokenizer
return _Llama3Tokenizer(*args, **kwargs)
File "/workspace/Megatron-LM/megatron/training/tokenizer/tokenizer.py", line 612, in __init__
self.tokenizer = Llama3Tokenizer(*args, **kwargs)
File "/ssd/llama3-main/llama/tokenizer.py", line 58, in __init__
mergeable_ranks = load_tiktoken_bpe(model_path)
File "/usr/local/lib/python3.10/dist-packages/tiktoken/load.py", line 117, in load_tiktoken_bpe
return {
File "/usr/local/lib/python3.10/dist-packages/tiktoken/load.py", line 118, in <dictcomp>
base64.b64decode(token): int(rank)
File "/usr/lib/python3.10/base64.py", line 87, in b64decode
return binascii.a2b_base64(s)
binascii.Error: Incorrect padding
That was a patchy fix. I think it is better to specify tokenizer as hugging face tokenizer.
--tokenizer-type HuggingFaceTokenizer
--tokenizer-model path to downloaded llama3 folder. that contains original/tokenizer.model file. (make sure the path is that of the folder that you downloaded and not tokenizer.model file)
That was a patchy fix. I think it is better to specify tokenizer as hugging face tokenizer.
--tokenizer-type HuggingFaceTokenizer --tokenizer-model path to downloaded llama3 folder. that contains original/tokenizer.model file. (make sure the path is that of the folder that you downloaded and not tokenizer.model file)
I followed your suggestion and used following config:
--use-mcore-models
--seq-length 8192
--num-layers 32
--hidden-size 4096
--ffn-hidden-size 14336
--num-attention-heads 32
--swiglu
--untie-embeddings-and-output-weights
--no-position-embedding
--use-rotary-position-embeddings
--max-position-embeddings 8192
--normalization 'RMSNorm'
--tokenizer-type 'HuggingFaceTokenizer'
and now I get CUDA out of memory error:
Traceback (most recent call last):
File "/workspace/Megatron-LM/pretrain_gpt.py", line 278, in <module>
return self._call_impl(*args, **kwargs)pretrain(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1560, in _call_impl
File "/workspace/Megatron-LM/megatron/training/training.py", line 376, in pretrain
iteration, num_floating_point_operations_so_far = train(
File "/workspace/Megatron-LM/megatron/training/training.py", line 1432, in train
result = forward_call(*args, **kwargs)
File "/workspace/Megatron-LM/megatron/core/tensor_parallel/layers.py", line 943, in forward
train_step(forward_step_func,output_parallel = self._forward_impl(
File "/workspace/Megatron-LM/megatron/training/training.py", line 754, in train_step
File "/workspace/Megatron-LM/megatron/core/tensor_parallel/layers.py", line 668, in linear_with_grad_accumulation_and_async_allreduce
return LinearWithGradAccumulationAndAsyncCommunication.apply(*args)
losses_reduced = forward_backward_func( File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 551, in apply
File "/workspace/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 452, in forward_backward_no_pipelining
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/autocast_mode.py", line 113, in decorate_fwd
output_tensor, num_tokens = forward_step(
File "/workspace/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 274, in forward_step
return fwd(*args, **kwargs)
File "/workspace/Megatron-LM/megatron/core/tensor_parallel/layers.py", line 439, in forward
output_tensor, loss_func = forward_step_func(data_iterator, model)
File "/workspace/Megatron-LM/pretrain_gpt.py", line 205, in forward_step
output = torch.matmul(total_input, weight.t())
output_tensor = model(tokens, position_ids, attention_mask,
torch.cuda File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.96 GiB. GPU 0 has a total capacity of 79.10 GiB of which 1.80 GiB is free. Including non-PyTorch memory, this process has 0 bytes memory in use. Of the allocated memory 74.53 GiB is allocated by PyTorch, and 18.17 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/Megatron-LM/megatron/core/distributed/data_parallel_base.py", line 22, in forward
return self.module(*inputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1560, in _call_impl
result = forward_call(*args, **kwargs)
File "/workspace/Megatron-LM/megatron/legacy/model/module.py", line 189, in forward
outputs = self.module(*inputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1560, in _call_impl
result = forward_call(*args, **kwargs)
File "/workspace/Megatron-LM/megatron/core/models/gpt/gpt_model.py", line 261, in forward
logits, _ = self.output_layer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1560, in _call_impl
result = forward_call(*args, **kwargs)
File "/workspace/Megatron-LM/megatron/core/tensor_parallel/layers.py", line 943, in forward
output_parallel = self._forward_impl(
File "/workspace/Megatron-LM/megatron/core/tensor_parallel/layers.py", line 668, in linear_with_grad_accumulation_and_async_allreduce
return LinearWithGradAccumulationAndAsyncCommunication.apply(*args)
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 551, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/autocast_mode.py", line 113, in decorate_fwd
return fwd(*args, **kwargs)
File "/workspace/Megatron-LM/megatron/core/tensor_parallel/layers.py", line 439, in forward
output = torch.matmul(total_input, weight.t())
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.96 GiB. GPU 7 has a total capacity of 79.10 GiB of which 1.63 GiB is free. Including non-PyTorch memory, this process has 0 bytes memory in use. Of the allocated memory 74.52 GiB is allocated by PyTorch, and 204.29 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
[2025-01-10 21:45:00,561] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 661) of binary: /usr/bin/python
I'm using two nodes with A3mega
OOM has nothing to do with tokenizer. If the model is not fitting in your GPU either use the SGD optimizer using --optimzer sgd . Decrease batch size of the training or using tensor parallel and pipeline parallel options. Try with --tensor-model-parallel-size num_gpus. replace num_gpus with the amount of GPUs you have
Marking as stale. No activity in 60 days.