optimum
optimum copied to clipboard
Add Cohere ONNX export support
trafficstars
What does this PR do?
This PR adds export support for Cohere models (similar to Llama). This does require one patch in transformers, however, due to a problematic torch.repeat_interleave op within CohereRotaryEmbedding:
with torch.autocast(device_type=device_type, enabled=False):
freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2)
- emb = torch.repeat_interleave(freqs, 2, dim=-1)
+ emb = freqs[..., None].expand(*freqs.shape, 2).reshape(*freqs.shape[:-1], -1)
cos = emb.cos()
sin = emb.sin()
I suspect this is a bug in torch/onnx, maybe @fxmarty can confirm? cc @saurabhdash2512 also, who contributed the model in https://github.com/huggingface/transformers/pull/29622.
Export logs without change:
$ optimum-cli export onnx -m hf-internal-testing/tiny-random-CohereModel o
Framework not specified. Using pt to export the model.
/usr/local/python/3.10.13/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
Automatic task detection to feature-extraction-with-past (possible synonyms are: default-with-past).
/usr/local/python/3.10.13/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
Using the export variant default. Available variants are:
- default: The default ONNX variant.
***** Exporting submodel 1/1: CohereModel *****
Using framework PyTorch: 2.2.2+cu121
Overriding 1 configuration item(s)
- use_cache -> True
/usr/local/python/3.10.13/lib/python3.10/site-packages/transformers/models/cohere/modeling_cohere.py:1014: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if sequence_length != 1:
2024-06-12 12:23:32.921382493 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Expand node. Name:'/layers.0/self_attn/rotary_emb/Expand' Status Message: invalid expand shape
Traceback (most recent call last):
File "/home/codespace/.python/current/bin/optimum-cli", line 8, in <module>
sys.exit(main())
File "/workspaces/optimum/optimum/commands/optimum_cli.py", line 208, in main
service.run()
File "/workspaces/optimum/optimum/commands/export/onnx.py", line 265, in run
main_export(
File "/workspaces/optimum/optimum/exporters/onnx/__main__.py", line 352, in main_export
onnx_export_from_model(
File "/workspaces/optimum/optimum/exporters/onnx/convert.py", line 1170, in onnx_export_from_model
_, onnx_outputs = export_models(
File "/workspaces/optimum/optimum/exporters/onnx/convert.py", line 776, in export_models
export(
File "/workspaces/optimum/optimum/exporters/onnx/convert.py", line 910, in export
config.fix_dynamic_axes(output, device=device, input_shapes=input_shapes, dtype=dtype)
File "/workspaces/optimum/optimum/exporters/onnx/base.py", line 335, in fix_dynamic_axes
outputs = session.run(None, onnx_inputs)
File "/usr/local/python/3.10.13/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Expand node. Name:'/layers.0/self_attn/rotary_emb/Expand' Status Message: invalid expand shape
Export logs with change:
$ optimum-cli export onnx -m hf-internal-testing/tiny-random-CohereModel o
Framework not specified. Using pt to export the model.
/usr/local/python/3.10.13/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
Automatic task detection to feature-extraction-with-past (possible synonyms are: default-with-past).
/usr/local/python/3.10.13/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
Using the export variant default. Available variants are:
- default: The default ONNX variant.
***** Exporting submodel 1/1: CohereModel *****
Using framework PyTorch: 2.2.2+cu121
Overriding 1 configuration item(s)
- use_cache -> True
/usr/local/python/3.10.13/lib/python3.10/site-packages/transformers/models/cohere/modeling_cohere.py:1014: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if sequence_length != 1:
Post-processing the exported models...
Deduplicating shared (tied) weights...
Validating ONNX model o/model.onnx...
-[✓] ONNX model output names match reference model (last_hidden_state, present.0.value, present.1.key, present.0.key, present.1.value)
- Validating ONNX Model output "last_hidden_state":
-[✓] (2, 16, 32) matches (2, 16, 32)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.0.key":
-[✓] (2, 4, 16, 8) matches (2, 4, 16, 8)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.0.value":
-[✓] (2, 4, 16, 8) matches (2, 4, 16, 8)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.1.key":
-[✓] (2, 4, 16, 8) matches (2, 4, 16, 8)
-[✓] all values close (atol: 1e-05)
- Validating ONNX Model output "present.1.value":
-[✓] (2, 4, 16, 8) matches (2, 4, 16, 8)
-[✓] all values close (atol: 1e-05)
The ONNX export succeeded and the exported model was saved at: o
Fixes # (issue)
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Did you make sure to update the documentation with your changes?
- [x] Did you write any new necessary tests?