executorch converting llama3 models with added tokens

Following up from this: https://github.com/pytorch/executorch/issues/3303

Converting finetuned llama3 models with the same special tokens works.

How can we convert llama3 finetunes which has added tokens? For example, this model: https://huggingface.co/Weyaxi/Einstein-v6.1-Llama3-8B is tuned on the ChatML format.

Converting using the same script gets this error:

RuntimeError: Error(s) in loading state_dict for Transformer:
        size mismatch for tok_embeddings.weight: copying a param with shape torch.Size([128260, 4096]) from checkpoint, the shape in current model is torch.Size([128256, 4096]).
        size mismatch for output.weight: copying a param with shape torch.Size([128260, 4096]) from checkpoint, the shape in current model is torch.Size([128256, 4096]).

This is due to the finetuned model having different number of tokens I believe.

May 06 '24 15:05 l3utterfly

@l3utterfly Can you share the full error message? I thought it happens at load_state_dict but it seems strict=False so it shouldn't error out. https://github.com/pytorch/executorch/blob/main/examples/models/llama2/model.py#L197

May 06 '24 22:05 larryliu0820

Full error here:

python -m examples.models.llama2.export_llama --checkpoint /home/layla/src/text-generation-webui/models/Einstein-v6.1-Llama3-8B/checkpoint.pth -p /home/layla/src/text-generation-webui/models/Meta-Llama-3-8B-Instruct/original/params.json -d=fp32 -X -qmode 8da4w -kv --use_sdpa_with_kv_cache --output_name="Einstein-v6.1-Llama3-8B_kv_sdpa_xnn_qe_4_32_ctx2048.pte" --group_size 256 --metadata '{"get_bos_id":128000, "get_eos_id":128001}' --embedding-quantize 4,32 --max_seq_len 2048
[INFO 2024-05-07 11:19:11,348 builder.py:84] Loading model with checkpoint=/home/layla/src/text-generation-webui/models/Einstein-v6.1-Llama3-8B/checkpoint.pth, params=/home/layla/src/text-generation-webui/models/Meta-Llama-3-8B-Instruct/original/params.json, use_kv_cache=True, weight_type=WeightType.LLAMA
Traceback (most recent call last):
  File "/home/layla/miniconda3/envs/executorch/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/layla/miniconda3/envs/executorch/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/layla/src/executorch/examples/models/llama2/export_llama.py", line 30, in <module>
    main()  # pragma: no cover
  File "/home/layla/src/executorch/examples/models/llama2/export_llama.py", line 26, in main
    export_llama(modelname, args)
  File "/home/layla/src/executorch/examples/models/llama2/export_llama_lib.py", line 302, in export_llama
    return _export_llama(modelname, args)
  File "/home/layla/src/executorch/examples/models/llama2/export_llama_lib.py", line 380, in _export_llama
    builder_exported_to_edge = _prepare_for_llama_export(
  File "/home/layla/src/executorch/examples/models/llama2/export_llama_lib.py", line 352, in _prepare_for_llama_export
    load_llama_model(
  File "/home/layla/src/executorch/examples/models/llama2/builder.py", line 87, in load_llama_model
    model, example_inputs, _ = EagerModelFactory.create_model(
  File "/home/layla/src/executorch/examples/models/model_factory.py", line 44, in create_model
    model = model_class(**kwargs)
  File "/home/layla/src/executorch/examples/models/llama2/model.py", line 195, in __init__
    self.model_.load_state_dict(
  File "/home/layla/miniconda3/envs/executorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2191, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Transformer:
        size mismatch for tok_embeddings.weight: copying a param with shape torch.Size([128260, 4096]) from checkpoint, the shape in current model is torch.Size([128256, 4096]).
        size mismatch for output.weight: copying a param with shape torch.Size([128260, 4096]) from checkpoint, the shape in current model is torch.Size([128256, 4096]).

May 07 '24 11:05 l3utterfly

@l3utterfly vocab_size is something configurable. Can you change the value in /home/layla/src/text-generation-webui/models/Meta-Llama-3-8B-Instruct/original/params.json to the new one and retry?

May 09 '24 18:05 larryliu0820

executorch executorch copied to clipboard

converting llama3 models with added tokens

executorch
executorch copied to clipboard