Megatron-LM Megatron-LM for LLaMa3

I'm attempting to train LLaMA-3 using Megatron-LM but have encountered an issue: LLaMA-3 utilizes Tiktoken for tokenization and doesn't provide a tokenizer.model file, which is required by Megatron-LM. How can I adapt or generate a compatible tokenizer.model for Megatron-LM? Any guidance or workaround would be greatly appreciated!

May 10 '24 08:05 Adamyangs

There is a tokenizer.model file in the Hugging Face Checkpoints under the /original folder, check https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/original/tokenizer.model

May 10 '24 14:05 TJ-Solergibert

Tente recomeçar do zero observando com calma os digitado. As vezes as máquinas falha.

May 10 '24 14:05 felipeliliti

also checkout the llama3 example in megatron launcher: https://github.com/NVIDIA/NeMo-Framework-Launcher/blob/main/examples/training/llama/h100/llama3_8b_bf16.sh

May 17 '24 22:05 ethanhe42

There is a tokenizer.model file in the Hugging Face Checkpoints under the /original folder, check https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/original/tokenizer.model

This doesn't work directly because this model file can get loaded from the sentencepiece.

Here's the error:

Traceback (most recent call last):
  File "/Users/dsdsdds/Downloads/check_tokenizer_model.py", line 5, in <module>
    print(sp.Load("./tokenizer.model"))
  File "/Users/dsdsdds/anaconda3/envs/moe/lib/python3.10/site-packages/sentencepiece/__init__.py", line 961, in Load
    return self.LoadFromFile(model_file)
  File "/Users/dsdsdds/anaconda3/envs/moe/lib/python3.10/site-packages/sentencepiece/__init__.py", line 316, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: could not parse ModelProto from ./tokenizer.model

May 19 '24 05:05 shamanez

There is a tokenizer.model file in the Hugging Face Checkpoints under the /original folder, check https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/original/tokenizer.model

This doesn't work directly because this model file can get loaded from the sentencepiece.

Here's the error:
Traceback (most recent call last):
  File "/Users/dsdsdds/Downloads/check_tokenizer_model.py", line 5, in <module>
    print(sp.Load("./tokenizer.model"))
  File "/Users/dsdsdds/anaconda3/envs/moe/lib/python3.10/site-packages/sentencepiece/__init__.py", line 961, in Load
    return self.LoadFromFile(model_file)
  File "/Users/dsdsdds/anaconda3/envs/moe/lib/python3.10/site-packages/sentencepiece/__init__.py", line 316, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: could not parse ModelProto from ./tokenizer.model

That's true! But bypassing this is pretty easy, just create a new tokenizer like the one of Llama2. You can do self.tokenizer = AutoTokenizer.from_pretrained() and change a bit some methods (For example, def tokenize(...): return self.tokenizer(....)).

May 20 '24 08:05 TJ-Solergibert

True

On Mon, 20 May 2024 at 8:59 PM, Antoni-Joan Solergibert < @.***> wrote:

There is a tokenizer.model file in the Hugging Face Checkpoints under the /original folder, check https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/original/tokenizer.model

This doesn't work directly because this model file can get loaded from the sentencepiece.

Here's the error:

Traceback (most recent call last): File "/Users/dsdsdds/Downloads/check_tokenizer_model.py", line 5, in print(sp.Load("./tokenizer.model")) File "/Users/dsdsdds/anaconda3/envs/moe/lib/python3.10/site-packages/sentencepiece/init.py", line 961, in Load return self.LoadFromFile(model_file) File "/Users/dsdsdds/anaconda3/envs/moe/lib/python3.10/site-packages/sentencepiece/init.py", line 316, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) RuntimeError: Internal: could not parse ModelProto from ./tokenizer.model

That's true! But bypassing this is pretty easy, just create a new tokenizer like the one of Llama2 https://github.com/NVIDIA/Megatron-LM/blob/c3677e09aa4e2eec37048307bd795928b8f8324a/megatron/training/tokenizer/tokenizer.py#L441. You can do self.tokenizer = AutoTokenizer.from_pretrained() and change a bit some methods (For example, def tokenize(...): return self.tokenizer(....)).

— Reply to this email directly, view it on GitHub https://github.com/NVIDIA/Megatron-LM/issues/818#issuecomment-2119999184, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEA4FGUDA4EQ5AJBLRE3U4LZDG3Q5AVCNFSM6AAAAABHQHBIXGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJZHE4TSMJYGQ . You are receiving this because you commented.Message ID: @.***>

May 20 '24 10:05 shamanez

真的 … On Mon, 20 May 2024 at 8:59 PM, Antoni-Joan Solergibert < @.> wrote: There is a tokenizer.model file in the Hugging Face Checkpoints under the /original folder, check https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/original/tokenizer.model This doesn't work directly because this model file can get loaded from the sentencepiece. Here's the error: Traceback (most recent call last): File "/Users/dsdsdds/Downloads/check_tokenizer_model.py", line 5, in print(sp.Load("./tokenizer.model")) File "/Users/dsdsdds/anaconda3/envs/moe/lib/python3.10/site-packages/sentencepiece/init.py", line 961, in Load return self.LoadFromFile(model_file) File "/Users/dsdsdds/anaconda3/envs/moe/lib/python3.10/site-packages/sentencepiece/init.py", line 316, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) RuntimeError: Internal: could not parse ModelProto from ./tokenizer.model That's true! But bypassing this is pretty easy, just create a new tokenizer like the one of Llama2 https://github.com/NVIDIA/Megatron-LM/blob/c3677e09aa4e2eec37048307bd795928b8f8324a/megatron/training/tokenizer/tokenizer.py#L441. You can do self.tokenizer = AutoTokenizer.from_pretrained() and change a bit some methods (For example, def tokenize(...): return self.tokenizer(....)). — Reply to this email directly, view it on GitHub <#818 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEA4FGUDA4EQ5AJBLRE3U4LZDG3Q5AVCNFSM6AAAAABHQHBIXGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJZHE4TSMJYGQ . You are receiving this because you commented.Message ID: @.>

I also encountered the same problem. Could you please share your configuration? Thank you very much

May 27 '24 10:05 IronMan-WangJinxi

It's really not usable in the latest megatron... for llama3. Hope official team can fix this...

Jun 18 '24 07:06 1049451037

Marking as stale. No activity in 60 days.

Aug 17 '24 18:08 github-actions[bot]

Just use the tokenizer below and add case for llama3tokenizer in arguments.py file and in the build_tokenizer function in the tmegatron/training/tokenizer/tokenizer.py. It seems to work this way

def create_llama3_tokenizer(*args, **kwargs):

    class _Llama3Tokenizer(MegatronTokenizer):
        def __init__(self, *args, **kwargs):
            super().__init__(*args, **kwargs)

            from llama.tokenizer import Tokenizer as Llama3Tokenizer

            self.tokenizer = Llama3Tokenizer(*args, **kwargs)

        def instruct_tokenize(self, s: str, bos=True, eos=False):
            '''Default args for text completion, not chat/dialog.'''

            assert type(s) is str

            t = self.tokenizer.encode(s, bos=bos, eos=eos, allowed_special='all')
            return t

        def tokenize(self, s: str, bos=True, eos=False):
            '''Default args for text completion, not chat/dialog.'''

            assert type(s) is str

            t = self.tokenizer.encode(s, bos=bos, eos=eos, allowed_special='all')
            return t

        def detokenize(self, ids):
            return self.tokenizer.decode(ids)

        @property
        def vocab(self):
            return self.tokenizer.vocab

        @property
        def inv_vocab(self):
            return self.tokenizer.inv_vocab

        @property
        def cls(self):
            return -1

        @property
        def sep(self):
            return -1

        @property
        def mask(self):
            return -1

        @property
        def eod(self):
            return self.tokenizer.eos_id

        @property
        def additional_special_tokens_ids(self):
            return None

        @property
        def vocab_size(self):
            return self.tokenizer.model.n_vocab

    return _Llama3Tokenizer(*args, **kwargs)

Nov 28 '24 03:11 eliird

Just use the tokenizer below and add case for llama3tokenizer in arguments.py file and in the build_tokenizer function in the tmegatron/training/tokenizer/tokenizer.py. It seems to work this way

@eliird where does from llama.tokenizer import Tokenizer as Llama3Tokenizer come from?

Dec 02 '24 09:12 dongmin-ra

Its the official llama3 tokenizer from this repo https://github.com/meta-llama/llama3/blob/main/llama/tokenizer.py. You can clone the repo and install with pip install -e .

Dec 02 '24 09:12 eliird

Thank you! It works for me:)

Dec 03 '24 12:12 dongmin-ra

@dongmin-ra can you explain how you made it work? I followed @eliird steps and added function create_llama3_tokenizer and also pip install -e llama3 dependency. I also used tokenizer.model from huggingface/Meta-Llama-3-8B but get following error:

Traceback (most recent call last):
  File "/workspace/Megatron-LM/pretrain_gpt.py", line 278, in <module>
    pretrain(
  File "/workspace/Megatron-LM/megatron/training/training.py", line 263, in pretrain
    initialize_megatron(
  File "/workspace/Megatron-LM/megatron/training/initialize.py", line 73, in initialize_megatron
    set_global_variables(args)
  File "/workspace/Megatron-LM/megatron/training/global_vars.py", line 93, in set_global_variables
    _ = _build_tokenizer(args)
  File "/workspace/Megatron-LM/megatron/training/global_vars.py", line 142, in _build_tokenizer
    _GLOBAL_TOKENIZER = build_tokenizer(args)
  File "/workspace/Megatron-LM/megatron/training/tokenizer/tokenizer.py", line 55, in build_tokenizer
    tokenizer = create_llama3_tokenizer(args.tokenizer_model)
  File "/workspace/Megatron-LM/megatron/training/tokenizer/tokenizer.py", line 665, in create_llama3_tokenizer
    return _Llama3Tokenizer(*args, **kwargs)
  File "/workspace/Megatron-LM/megatron/training/tokenizer/tokenizer.py", line 612, in __init__
    self.tokenizer = Llama3Tokenizer(*args, **kwargs)
  File "/ssd/llama3-main/llama/tokenizer.py", line 58, in __init__
    mergeable_ranks = load_tiktoken_bpe(model_path)
  File "/usr/local/lib/python3.10/dist-packages/tiktoken/load.py", line 117, in load_tiktoken_bpe
    return {
  File "/usr/local/lib/python3.10/dist-packages/tiktoken/load.py", line 118, in <dictcomp>
    base64.b64decode(token): int(rank)
  File "/usr/lib/python3.10/base64.py", line 87, in b64decode
    return binascii.a2b_base64(s)
binascii.Error: Incorrect padding

Dec 19 '24 03:12 ycchenzheng

That was a patchy fix. I think it is better to specify tokenizer as hugging face tokenizer.

--tokenizer-type HuggingFaceTokenizer
--tokenizer-model path to downloaded llama3 folder. that contains original/tokenizer.model file. (make sure the path is that of the folder that you downloaded and not tokenizer.model file)

Dec 22 '24 03:12 eliird

That was a patchy fix. I think it is better to specify tokenizer as hugging face tokenizer.

--tokenizer-type HuggingFaceTokenizer --tokenizer-model path to downloaded llama3 folder. that contains original/tokenizer.model file. (make sure the path is that of the folder that you downloaded and not tokenizer.model file)

I followed your suggestion and used following config:

  --use-mcore-models
  --seq-length 8192
  --num-layers 32
  --hidden-size 4096
  --ffn-hidden-size 14336
  --num-attention-heads 32
  --swiglu
  --untie-embeddings-and-output-weights
  --no-position-embedding
  --use-rotary-position-embeddings
  --max-position-embeddings 8192
  --normalization 'RMSNorm'
  --tokenizer-type 'HuggingFaceTokenizer'

and now I get CUDA out of memory error:

Traceback (most recent call last):
  File "/workspace/Megatron-LM/pretrain_gpt.py", line 278, in <module>
        return self._call_impl(*args, **kwargs)pretrain(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1560, in _call_impl

  File "/workspace/Megatron-LM/megatron/training/training.py", line 376, in pretrain
    iteration, num_floating_point_operations_so_far = train(
  File "/workspace/Megatron-LM/megatron/training/training.py", line 1432, in train
    result = forward_call(*args, **kwargs)
  File "/workspace/Megatron-LM/megatron/core/tensor_parallel/layers.py", line 943, in forward
        train_step(forward_step_func,output_parallel = self._forward_impl(

  File "/workspace/Megatron-LM/megatron/training/training.py", line 754, in train_step
  File "/workspace/Megatron-LM/megatron/core/tensor_parallel/layers.py", line 668, in linear_with_grad_accumulation_and_async_allreduce
    return LinearWithGradAccumulationAndAsyncCommunication.apply(*args)
    losses_reduced = forward_backward_func(  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 551, in apply

  File "/workspace/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 452, in forward_backward_no_pipelining
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/autocast_mode.py", line 113, in decorate_fwd
    output_tensor, num_tokens = forward_step(
  File "/workspace/Megatron-LM/megatron/core/pipeline_parallel/schedules.py", line 274, in forward_step
    return fwd(*args, **kwargs)
  File "/workspace/Megatron-LM/megatron/core/tensor_parallel/layers.py", line 439, in forward
    output_tensor, loss_func = forward_step_func(data_iterator, model)
  File "/workspace/Megatron-LM/pretrain_gpt.py", line 205, in forward_step
    output = torch.matmul(total_input, weight.t())
    output_tensor = model(tokens, position_ids, attention_mask,
torch.cuda  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.96 GiB. GPU 0 has a total capacity of 79.10 GiB of which 1.80 GiB is free. Including non-PyTorch memory, this process has 0 bytes memory in use. Of the allocated memory 74.53 GiB is allocated by PyTorch, and 18.17 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/Megatron-LM/megatron/core/distributed/data_parallel_base.py", line 22, in forward
    return self.module(*inputs, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1560, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/workspace/Megatron-LM/megatron/legacy/model/module.py", line 189, in forward
    outputs = self.module(*inputs, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1560, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/workspace/Megatron-LM/megatron/core/models/gpt/gpt_model.py", line 261, in forward
    logits, _ = self.output_layer(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1560, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/workspace/Megatron-LM/megatron/core/tensor_parallel/layers.py", line 943, in forward
    output_parallel = self._forward_impl(
  File "/workspace/Megatron-LM/megatron/core/tensor_parallel/layers.py", line 668, in linear_with_grad_accumulation_and_async_allreduce
    return LinearWithGradAccumulationAndAsyncCommunication.apply(*args)
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 551, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/autocast_mode.py", line 113, in decorate_fwd
    return fwd(*args, **kwargs)
  File "/workspace/Megatron-LM/megatron/core/tensor_parallel/layers.py", line 439, in forward
    output = torch.matmul(total_input, weight.t())
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.96 GiB. GPU 7 has a total capacity of 79.10 GiB of which 1.63 GiB is free. Including non-PyTorch memory, this process has 0 bytes memory in use. Of the allocated memory 74.52 GiB is allocated by PyTorch, and 204.29 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
[2025-01-10 21:45:00,561] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 661) of binary: /usr/bin/python

I'm using two nodes with A3mega

Jan 10 '25 22:01 ycchenzheng

OOM has nothing to do with tokenizer. If the model is not fitting in your GPU either use the SGD optimizer using --optimzer sgd . Decrease batch size of the training or using tensor parallel and pipeline parallel options. Try with --tensor-model-parallel-size num_gpus. replace num_gpus with the amount of GPUs you have

Jan 14 '25 02:01 eliird

Marking as stale. No activity in 60 days.

Mar 15 '25 18:03 github-actions[bot]

Megatron-LM Megatron-LM copied to clipboard

Megatron-LM for LLaMa3

Megatron-LM
Megatron-LM copied to clipboard