Add Cohere ONNX export support

Open xenova opened this issue 1 year ago • 2 comments

trafficstars

What does this PR do?

This PR adds export support for Cohere models (similar to Llama). This does require one patch in transformers, however, due to a problematic torch.repeat_interleave op within CohereRotaryEmbedding:

with torch.autocast(device_type=device_type, enabled=False):
    freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2)
-   emb = torch.repeat_interleave(freqs, 2, dim=-1)
+   emb = freqs[..., None].expand(*freqs.shape, 2).reshape(*freqs.shape[:-1], -1)
    cos = emb.cos()
    sin = emb.sin()

I suspect this is a bug in torch/onnx, maybe @fxmarty can confirm? cc @saurabhdash2512 also, who contributed the model in https://github.com/huggingface/transformers/pull/29622.

Export logs without change:

$ optimum-cli export onnx -m hf-internal-testing/tiny-random-CohereModel o
Framework not specified. Using pt to export the model.
/usr/local/python/3.10.13/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Automatic task detection to feature-extraction-with-past (possible synonyms are: default-with-past).
/usr/local/python/3.10.13/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Using the export variant default. Available variants are:
    - default: The default ONNX variant.

***** Exporting submodel 1/1: CohereModel *****
Using framework PyTorch: 2.2.2+cu121
Overriding 1 configuration item(s)
        - use_cache -> True
/usr/local/python/3.10.13/lib/python3.10/site-packages/transformers/models/cohere/modeling_cohere.py:1014: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if sequence_length != 1:
2024-06-12 12:23:32.921382493 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Expand node. Name:'/layers.0/self_attn/rotary_emb/Expand' Status Message: invalid expand shape
Traceback (most recent call last):
  File "/home/codespace/.python/current/bin/optimum-cli", line 8, in <module>
    sys.exit(main())
  File "/workspaces/optimum/optimum/commands/optimum_cli.py", line 208, in main
    service.run()
  File "/workspaces/optimum/optimum/commands/export/onnx.py", line 265, in run
    main_export(
  File "/workspaces/optimum/optimum/exporters/onnx/__main__.py", line 352, in main_export
    onnx_export_from_model(
  File "/workspaces/optimum/optimum/exporters/onnx/convert.py", line 1170, in onnx_export_from_model
    _, onnx_outputs = export_models(
  File "/workspaces/optimum/optimum/exporters/onnx/convert.py", line 776, in export_models
    export(
  File "/workspaces/optimum/optimum/exporters/onnx/convert.py", line 910, in export
    config.fix_dynamic_axes(output, device=device, input_shapes=input_shapes, dtype=dtype)
  File "/workspaces/optimum/optimum/exporters/onnx/base.py", line 335, in fix_dynamic_axes
    outputs = session.run(None, onnx_inputs)
  File "/usr/local/python/3.10.13/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Expand node. Name:'/layers.0/self_attn/rotary_emb/Expand' Status Message: invalid expand shape

Export logs with change:

$ optimum-cli export onnx -m hf-internal-testing/tiny-random-CohereModel o
Framework not specified. Using pt to export the model.
/usr/local/python/3.10.13/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Automatic task detection to feature-extraction-with-past (possible synonyms are: default-with-past).
/usr/local/python/3.10.13/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Using the export variant default. Available variants are:
    - default: The default ONNX variant.

***** Exporting submodel 1/1: CohereModel *****
Using framework PyTorch: 2.2.2+cu121
Overriding 1 configuration item(s)
        - use_cache -> True
/usr/local/python/3.10.13/lib/python3.10/site-packages/transformers/models/cohere/modeling_cohere.py:1014: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if sequence_length != 1:
Post-processing the exported models...
Deduplicating shared (tied) weights...

Validating ONNX model o/model.onnx...
        -[✓] ONNX model output names match reference model (last_hidden_state, present.0.value, present.1.key, present.0.key, present.1.value)
        - Validating ONNX Model output "last_hidden_state":
                -[✓] (2, 16, 32) matches (2, 16, 32)
                -[✓] all values close (atol: 1e-05)
        - Validating ONNX Model output "present.0.key":
                -[✓] (2, 4, 16, 8) matches (2, 4, 16, 8)
                -[✓] all values close (atol: 1e-05)
        - Validating ONNX Model output "present.0.value":
                -[✓] (2, 4, 16, 8) matches (2, 4, 16, 8)
                -[✓] all values close (atol: 1e-05)
        - Validating ONNX Model output "present.1.key":
                -[✓] (2, 4, 16, 8) matches (2, 4, 16, 8)
                -[✓] all values close (atol: 1e-05)
        - Validating ONNX Model output "present.1.value":
                -[✓] (2, 4, 16, 8) matches (2, 4, 16, 8)
                -[✓] all values close (atol: 1e-05)
The ONNX export succeeded and the exported model was saved at: o

Fixes # (issue)

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ ] Did you make sure to update the documentation with your changes?
[x] Did you write any new necessary tests?

Who can review?

Jun 12 '24 12:06 xenova

optimum optimum copied to clipboard

Add Cohere ONNX export support

What does this PR do?

Before submitting

Who can review?

optimum
optimum copied to clipboard