transformers
transformers copied to clipboard
Tracing mismatch during conversion of Whisper model to ONNX using torch.onnx.export
I'm trying to convert Whisper model to onnx, so when exporting encoder of Whisper model to onnx by using torch.onnx.export:
mel = torch.zeros((1, 80, 3000))
encoder = model.get_encoder().to('cpu')
audio_features = encoder(mel)
torch.onnx.export(
encoder,
mel,
"whisper_encoder.onnx",
input_names=["mel"],
output_names=["output_features"]
)
It raises a TracerWarning as follows:
/usr/local/lib/python3.8/dist-packages/transformers/models/whisper/modeling_whisper.py:207: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/usr/local/lib/python3.8/dist-packages/transformers/models/whisper/modeling_whisper.py:246: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
Afterwards, the onnx file is generated but the resulted model for runtime (using Optimum) is slow (about 50% slower than pytorch run)! I guess that slowness of the onnx model is due to the TracerWarning.
Any Idea?
I'm using transformers == 4.26.0, optimum==1.6.1, onnx==1.10.0 and torch==1.12.0+cu116.
Hi @hannan72! I recommend that you use Optimum for exporting Whisper to the ONNX format (it will basically be a wrapper around torch.onnx.export
but it is tested and Whisper is supported). You can find more information in the doc: https://huggingface.co/docs/optimum/exporters/onnx/overview
If you encounter any issue, feel free to open an issue in the Optimum repo.
Hi @hannan72! I recommend that you use Optimum for exporting Whisper to the ONNX format (it will basically be a wrapper around
torch.onnx.export
but it is tested and Whisper is supported). You can find more information in the doc: https://huggingface.co/docs/optimum/exporters/onnx/overview If you encounter any issue, feel free to open an issue in the Optimum repo.
I have used the Optimum but I get such a Warning and the resulted ONNX model deployed by Optimum ORT is about 50% slower that pytorch model deployment
Yes I see you opened this issue in Optimum: https://github.com/huggingface/optimum/issues/827 I think the best is to wait for @fxmarty to take a look at it.
Regarding these warnings, I don't think they are the reason why it is slow. They just mean that the expression in the if statements will not be evaluated at runtime, so the model may fail with different batch sizes.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.