optimum export `audio-classification` Whisper to TFLite

Feature request

export audio-classification model based on OpenAI whisper model to other formats

Motivation

After loading OpenAI's whisper:

transformers.WhisperForAudioClassification.from_pretrained(
    "openai/whisper-tiny",
    num_labels=n_labels, 
    label2id=labels2id, 
    id2label=id2labels,
)

and fine-tuning it to use for audio classification, I'd like to export the saved model.safetensors model to TFLite.

Ways I've tried:

Directly: Using the command optimum-cli export tflite --task audio-classification ... - raises error

ValueError: Unrecognized configuration class 
<class 'transformers.models.whisper.configuration_whisper.WhisperConfig'> 
for this kind of AutoModel: TFAutoModelForAudioClassification. 
Model type should be one of Wav2Vec2Config

It only seems to accept Audio Classification models if they are based on Wav2Vec2, not if they are built from Whisper.

Via ONNX: I've previously had difficulties exporting direct to TFLite but was able to export to ONNX, and was then able to move it to TF and then TFLite. Trying that route this time however (optimum-cli export onnx --task audio-classification ...) failed with error

ValueError: Asked to export a whisper model for the task audio-classification, 
but the Optimum ONNX exporter only supports the tasks 
feature-extraction, feature-extraction-with-past, automatic-speech-recognition, 
automatic-speech-recognition-with-past for whisper. 
Please use a supported task. 
Please open an issue at https://github.com/huggingface/optimum/issues 
if you would like the task audio-classification to be supported 
in the ONNX export for whisper

Whisper is essentially for ASR, which is presumably why the exporter supports feature extraction and ASR only. However as hugging face enables tuning whisper for a classifier, it would be great if optimum could support such models as well!

Any chance this can be added - both exporting to TFLite and to ONNX would be useful.

Many thanks!!!

Your contribution

Apologies, cannot contribute

Jan 27 '24 22:01 Gabriel-Kissin

Having the same Problem. Even to save the Model in the Tensorflow Format is kind of difficult, because I just cannot save encoder and decoder seperatly and onnx to tflite adds custom ops that I don't want.

Feb 13 '24 19:02 AnonymUnsichtbar

@AnonymUnsichtbar @Gabriel-Kissin To be fair the TFLite support is currently quite minimal, only a few simple architectures are supported ('albert', 'bert', 'camembert', 'convbert', 'deberta', 'deberta_v2', 'distilbert', 'electra', 'flaubert', 'mobilebert', 'mpnet', 'resnet', 'roberta', 'roformer', 'xlm', 'xlm_roberta').

We haven't added support for decoders/encoder-decoder yet.

tflite adds custom ops

I wonder if this is related to the fact that we use subgraphs in ONNX to handle past key values (KV cache) in a single ONNX model. Have you tried to export to TFLite e.g. a decoder_with_past_model.onnx instead of decoder_model_merged.onnx?

Feb 26 '24 13:02 fxmarty

I'm also interested in being able to export audio-classification whisper to onnx! This would be a huge help!!!

Feb 26 '24 17:02 WeiXiaoSummer

Hi, export of Transformers Whisper to ONNX for audio-classification is merged in https://github.com/huggingface/optimum/pull/1727, for example:

optimum-cli export onnx --model shhossain/whisper-tiny-bn-emo whisper_onnx

For TFLite, there are no short term plans but I am happy to review PRs.

Feb 28 '24 16:02 fxmarty

optimum optimum copied to clipboard

export `audio-classification` Whisper to TFLite

Feature request

Motivation

Ways I've tried:

Your contribution

optimum
optimum copied to clipboard