executorch RuntimeError: mmap can only be used with files saved with torch.save and _use_new_zipfile

🐛 Describe the bug

I followed the steps from https://github.com/pytorch/executorch/blob/main/examples/models/llama2/README.md.

The installation steps are as follows:

git clone https://github.com/pytorch/executorch.git (2024/9/2 main branch) cd executorch git submodule sync git submodule update --init python3 -m venv .llama && source .llama/bin/activate pip install torch torchvision torchaudio pip install setuptools wheel ./install_requirements.sh --pybind xnnpack

then, I try Step 2: Prepare model and choose Option C: Download and export Llama 3 8B instruct model form hugging face to run llama3 with the following script

python3 -m examples.models.llama2.export_llama --checkpoint Meta-Llama-3-8B-Instruct/original/consolidated.00.pth -p Meta-Llama-3-8B-Instruct/original/params.json -kv --use_sdpa_with_kv_cache -X -qmode 8da4w  --group_size 128 -d fp32 --metadata '{"get_bos_id":128000, "get_eos_id":128001}' --embedding-quantize 4,32 --output_name="llama3_kv_sdpa_xnn_qe_4_32.pte"

I'm getting the following runtime error:

[INFO 2024-09-03 08:42:37,512 export_llama_lib.py:450] Applying quantizers: []
[INFO 2024-09-03 08:42:37,512 export_llama_lib.py:646] Loading model with checkpoint=Meta-Llama-3-8B-Instruct/original/consolidated.00.pth, params=Meta-Llama-3-8B-Instruct/original/params.json, use_kv_cache=True, weight_type=WeightType.LLAMA
/Users/MyPath/executorch/examples/models/llama2/model.py:100: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(checkpoint_path, map_location=device, mmap=True)
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/MyPath/executorch/examples/models/llama2/export_llama.py", line 30, in <module>
    main()  # pragma: no cover
    ^^^^^^
  File "/Users/MyPath/executorch/examples/models/llama2/export_llama.py", line 26, in main
    export_llama(modelname, args)
  File "/Users/MyPath/executorch/examples/models/llama2/export_llama_lib.py", line 352, in export_llama
    builder = _export_llama(modelname, args)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/MyPath/executorch/examples/models/llama2/export_llama_lib.py", line 474, in _export_llama
    _prepare_for_llama_export(modelname, args)
  File "/Users/MyPath/executorch/examples/models/llama2/export_llama_lib.py", line 414, in _prepare_for_llama_export
    _load_llama_model(
  File "/Users/MyPath/executorch/examples/models/llama2/export_llama_lib.py", line 649, in _load_llama_model
    model, example_inputs, _ = EagerModelFactory.create_model(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/MyPath/executorch/examples/models/model_factory.py", line 44, in create_model
    model = model_class(**kwargs)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/MyPath/executorch/examples/models/llama2/model.py", line 100, in __init__
    checkpoint = torch.load(checkpoint_path, map_location=device, mmap=True)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/MyPath/executorch/.llama/lib/python3.12/site-packages/torch/serialization.py", line 1284, in load
    raise RuntimeError(
RuntimeError: mmap can only be used with files saved with `torch.save(Meta-Llama-3-8B-Instruct/original/consolidated.00.pth, _use_new_zipfile_serialization=True), please torch.save your checkpoint with this option in order to use mmap.

Versions

env: M1 MacBook Pro MacOS: 14.5 python: 3.12 torch:2.5.0.dev20240829 (https://pypi.org/simple, https://download.pytorch.org/whl/nightly/cpu) cmake: 3.30.2-py3-none-macosx_11_0_universal2.macosx_10_10_x86_64.macosx_11_0_arm64.whl (47.9 MB) wheel-0.44.0-py3-none-any.whl.metadata (2.3 kB)

Sep 03 '24 00:09 MisssssXie

@iseeyuan @larryliu0820

Sep 05 '24 16:09 JacobSzwejbka

Can you try setting mmap=False in /Users/MyPath/executorch/examples/models/llama2/model.py:100

Sep 05 '24 22:09 larryliu0820

Can you try setting mmap=False in /Users/MyPath/executorch/examples/models/llama2/model.py:100

Of course, I've tried that as well, but it still failed.

[INFO 2024-09-02 10:16:18,296 export_llama_lib.py:450] Applying quantizers: []
[INFO 2024-09-02 10:16:18,296 export_llama_lib.py:646] Loading model with checkpoint=Meta-Llama-3-8B/original/consolidated.00.pth, params=Meta-Llama-3-8B-Instruct/original/params.json, use_kv_cache=True, weight_type=WeightType.LLAMA
/Users/MyPath/executorch/examples/models/llama2/model.py:100: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(checkpoint_path, map_location=device, mmap=False)
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/MyPath/executorch/examples/models/llama2/export_llama.py", line 30, in <module>
    main()  # pragma: no cover
    ^^^^^^
  File "/Users/MyPath/executorch/examples/models/llama2/export_llama.py", line 26, in main
    export_llama(modelname, args)
  File "/Users/MyPath/executorch/examples/models/llama2/export_llama_lib.py", line 352, in export_llama
    builder = _export_llama(modelname, args)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/MyPath/executorch/examples/models/llama2/export_llama_lib.py", line 474, in _export_llama
    _prepare_for_llama_export(modelname, args)
  File "/Users/MyPath/executorch/examples/models/llama2/export_llama_lib.py", line 414, in _prepare_for_llama_export
    _load_llama_model(
  File "/Users/MyPath/executorch/examples/models/llama2/export_llama_lib.py", line 649, in _load_llama_model
    model, example_inputs, _ = EagerModelFactory.create_model(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/MyPath/executorch/examples/models/model_factory.py", line 44, in create_model
    model = model_class(**kwargs)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/MyPath/executorch/examples/models/llama2/model.py", line 100, in __init__
    checkpoint = torch.load(checkpoint_path, map_location=device, mmap=False)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/MyPath/executorch/.llama/lib/python3.12/site-packages/torch/serialization.py", line 1299, in load
    return _legacy_load(
           ^^^^^^^^^^^^^
  File "/Users/MyPath/executorch/.llama/lib/python3.12/site-packages/torch/serialization.py", line 1543, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_pickle.UnpicklingError: invalid load key, 'v'.

Sep 06 '24 00:09 MisssssXie

@MisssssXie let me try to repro this

Sep 06 '24 01:09 larryliu0820

Hi, may I ask if there’s a solution for the follow-up?

Sep 13 '24 05:09 MisssssXie

@MisssssXie oops sorry was quite busy these 2 weeks. I'm repro'ing right now

Sep 13 '24 17:09 larryliu0820

I downloaded .pth file on the repo instead of using .pth file from git clone to resolve this issue Screenshot 2024-09-17 at 03 36 03

Sep 16 '24 20:09 phamnhuvu-dev

@MisssssXie I think the issue is at how you download the model and I think @phamnhuvu-dev is correct that you need to download it from git clone. Though the fastest download method I used is:

pip install huggingface-cli hf_transfer
huggingface-cli login
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download meta-llama/Meta-Llama-3-8B-Instruct --local-dir . --include original/consolidated.00.pth

Reference: https://huggingface.co/docs/hub/en/models-downloading. For me these commands give me this file: original/consolidated.00.pth. Then you can run the export command.

Note: Please don't use curl -O or wget directly on https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/blob/main/original/consolidated.00.pth.

Let me know if this is helpful

Sep 23 '24 19:09 larryliu0820

Unfortunately, it still failed.

Oct 02 '24 03:10 MisssssXie

@MisssssXie can you share more information? For example, can you make sure the checksum of your checkpoint is valid?

Oct 02 '24 03:10 larryliu0820

RuntimeError: mmap can only be used with files saved with torch.save and _use_new_zipfile_serialization=True

🐛 Describe the bug

Versions