exporters
exporters copied to clipboard
Export of Llama2 fails
I'm unable to use exporters
for meta-llama/Llama-2-7b-chat-hf
model.
Here is my command
python -m exporters.coreml --model=meta-llama/Llama-2-7b-chat-hf models/llama2.mlpackage
And here is the output
% python -m exporters.coreml --model=meta-llama/Llama-2-7b-chat-hf models/llama2.mlpackage
Torch version 2.3.0 has not been tested with coremltools. You may run into unexpected errors. Torch 2.2.0 is the most recent version that has been tested.
/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:30<00:00, 15.44s/it]
Using framework PyTorch: 2.3.0
Overriding 1 configuration item(s)
- use_cache -> False
/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/transformers/modeling_utils.py:4371: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
The cos_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the `LlamaAttention` class
The sin_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the `LlamaAttention` class
/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py:1094: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if sequence_length != 1:
Skipping token_type_ids input
Converting PyTorch Frontend ==> MIL Ops: 0%| | 0/3690 [00:00<?, ? ops/s]Saving value type of int64 into a builtin type of int32, might lose precision!
ERROR - converting 'full' op (located at: 'model'):
Converting PyTorch Frontend ==> MIL Ops: 1%|▉ | 28/3690 [00:00<00:00, 5249.21 ops/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/Users/user/LLAMA2/exporters/src/exporters/coreml/__main__.py", line 178, in <module>
main()
File "/Users/user/LLAMA2/exporters/src/exporters/coreml/__main__.py", line 166, in main
convert_model(
File "/Users/user/LLAMA2/exporters/src/exporters/coreml/__main__.py", line 45, in convert_model
mlmodel = export(
^^^^^^^
File "/Users/user/LLAMA2/exporters/src/exporters/coreml/convert.py", line 660, in export
return export_pytorch(preprocessor, model, config, quantize, compute_units)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/user/LLAMA2/exporters/src/exporters/coreml/convert.py", line 553, in export_pytorch
mlmodel = ct.convert(
^^^^^^^^^^^
File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/_converters_entry.py", line 581, in convert
mlmodel = mil_convert(
^^^^^^^^^^^^
File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 188, in mil_convert
return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 212, in _mil_convert
proto, mil_program = mil_convert_to_proto(
^^^^^^^^^^^^^^^^^^^^^
File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 288, in mil_convert_to_proto
prog = frontend_converter(model, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 108, in __call__
return load(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 82, in load
return _perform_torch_convert(converter, debug)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 116, in _perform_torch_convert
prog = converter.convert()
^^^^^^^^^^^^^^^^^^^
File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 581, in convert
convert_nodes(self.context, self.graph)
File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 86, in convert_nodes
raise e # re-raise exception
^^^^^^^
File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 81, in convert_nodes
convert_single_node(context, node)
File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 134, in convert_single_node
add_op(context, node)
File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 4211, in full
else NUM_TO_NUMPY_DTYPE[TORCH_DTYPE_TO_NUM[inputs[2].val]]
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
KeyError: 6
I was able to generate a mlpackage
for distilbert-base-uncased-finetuned-sst-2-english
, with this command: python -m exporters.coreml --model=distilbert-base-uncased-finetuned-sst-2-english --feature=sequence-classification models/defaults.mlpackage
, so I have some confidence that the environment is correct and working.
I have the exact same issue
I get the same error when trying to convert: https://huggingface.co/HuggingFaceTB/SmolLM-1.7B
I'm wondering if this is one of the situations where the torch operation -> CoreML Operation mapping does not automatically work (i.e. requires us to write our own operator: https://apple.github.io/coremltools/docs-guides/source/custom-operators.html
@rradjabi try installing coremltools 8 and a newer version of transformers! I was able to run this conversion just fine 👏 (with my own memory fixing patch ofcourse).
@Proryanator Could you please provide more details about your fix? Thanks
@Proryanator Could you please provide more details about your fix? Thanks
Yeah sure! Let me collect the specific details (it was a bit complicated in the end).
In a nutshell though:
Out of Memory Issue For some models (not even the ones that were that large) including llama2, I would get an out of memory error on my M3 Max w/ 36GB of RAM. This happened when coremltools tried to load the converted model toward the end. Figured out that making a 1 line change to exporters would fix this issue for me, here is that change: https://github.com/huggingface/exporters/pull/83
Unsupported 'full' op
It was either from upgrading to coremltools 8.0b1 where this op issue went away, or using an older version of transformers
that fixed the issue for me (I did both so I can't say which one at the moment). Lemme double check and I can get back to you though with specifics (pretty sure it was the transformers
version though).