transformers Tracing mismatch during conversion of Whisper model to ONNX using torch.onnx.export

I'm trying to convert Whisper model to onnx, so when exporting encoder of Whisper model to onnx by using torch.onnx.export:

mel = torch.zeros((1, 80, 3000))
encoder = model.get_encoder().to('cpu')
audio_features = encoder(mel)

torch.onnx.export(
    encoder,
    mel,
    "whisper_encoder.onnx",
    input_names=["mel"],
    output_names=["output_features"]
)

It raises a TracerWarning as follows:

/usr/local/lib/python3.8/dist-packages/transformers/models/whisper/modeling_whisper.py:207: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):

/usr/local/lib/python3.8/dist-packages/transformers/models/whisper/modeling_whisper.py:246: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):

Afterwards, the onnx file is generated but the resulted model for runtime (using Optimum) is slow (about 50% slower than pytorch run)! I guess that slowness of the onnx model is due to the TracerWarning.

Any Idea?

I'm using transformers == 4.26.0, optimum==1.6.1, onnx==1.10.0 and torch==1.12.0+cu116.

Mar 09 '23 08:03 hannan72

Hi @hannan72! I recommend that you use Optimum for exporting Whisper to the ONNX format (it will basically be a wrapper around torch.onnx.export but it is tested and Whisper is supported). You can find more information in the doc: https://huggingface.co/docs/optimum/exporters/onnx/overview If you encounter any issue, feel free to open an issue in the Optimum repo.

Mar 09 '23 13:03 regisss

Hi @hannan72! I recommend that you use Optimum for exporting Whisper to the ONNX format (it will basically be a wrapper around torch.onnx.export but it is tested and Whisper is supported). You can find more information in the doc: https://huggingface.co/docs/optimum/exporters/onnx/overview If you encounter any issue, feel free to open an issue in the Optimum repo.

I have used the Optimum but I get such a Warning and the resulted ONNX model deployed by Optimum ORT is about 50% slower that pytorch model deployment

Mar 09 '23 14:03 hannan72

Yes I see you opened this issue in Optimum: https://github.com/huggingface/optimum/issues/827 I think the best is to wait for @fxmarty to take a look at it.

Regarding these warnings, I don't think they are the reason why it is slow. They just mean that the expression in the if statements will not be evaluated at runtime, so the model may fail with different batch sizes.

Mar 09 '23 14:03 regisss

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Apr 08 '23 15:04 github-actions[bot]

transformers transformers copied to clipboard

Tracing mismatch during conversion of Whisper model to ONNX using torch.onnx.export

transformers
transformers copied to clipboard