optimum Add support to export facebook encodec models to ONNX

Feature request

When I try to use optimum-cli to export the facebook/encodec_32khz model I get this error:

%  optimum-cli export onnx --model facebook/encodec_32khz encodec.onnx
Framework not specified. Using pt to export to ONNX.
/Users/micchig/micromamba/envs/music-representation/lib/python3.11/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
Traceback (most recent call last):
  File "/Users/micchig/micromamba/envs/music-representation/bin/optimum-cli", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/micchig/micromamba/envs/music-representation/lib/python3.11/site-packages/optimum/commands/optimum_cli.py", line 163, in main
    service.run()
  File "/Users/micchig/micromamba/envs/music-representation/lib/python3.11/site-packages/optimum/commands/export/onnx.py", line 246, in run
    main_export(
  File "/Users/micchig/micromamba/envs/music-representation/lib/python3.11/site-packages/optimum/exporters/onnx/__main__.py", line 408, in main_export
    raise ValueError(
ValueError: Trying to export a encodec model, that is a custom or unsupported architecture for the task feature-extraction, but no custom onnx configuration was passed as `custom_onnx_configs`. Please refer to https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/export_a_model#custom-export-of-transformers-models for an example on how to export custom models. Please open an issue at https://github.com/huggingface/optimum/issues if you would like the model type encodec to be supported natively in the ONNX export.

I am following the advice in the message and opening an issue here. :)

Motivation

I want to use the encodec model for inference and I'd much rather use ONNX than importing the pretrained model from transformers every time and run it in pytorch as ONNX is much faster.

Your contribution

I'm afraid I can't contribute to this personally

Nov 17 '23 11:11 giamic

Thank you @giamic, adding it todo :)

Jan 09 '24 16:01 fxmarty

Hi @giamic, this one is highly non-trivial. I'm working on it this week.

Mar 26 '24 17:03 fxmarty

@xenova @giamic I am planning to export a model whose I/O is the same as https://github.com/huggingface/transformers/blob/f01e1609bf4dba146d1347c1368c8c49df8636f6/src/transformers/models/encodec/modeling_encodec.py#L575 and https://github.com/huggingface/transformers/blob/f01e1609bf4dba146d1347c1368c8c49df8636f6/src/transformers/models/encodec/modeling_encodec.py#L703. Does that sound fine to you for your use cases? Subparts (quantizer, etc.) would not be exported independently.

Mar 26 '24 17:03 fxmarty

Thank you @fxmarty ! If I understand correctly, there would be two separate models: EncodecEncoder and EncodecDecoder. The Encoder would take an audio file and output its quantised representation, where every element of the output array would be a codebook index; and the decoder part would take the quantised representation and output an audio file.

I think that this is generally good. I haven't understood whether or not we would be provided some access to the codebooks to map the quantised representation back into the non-quantised latent space. (you said that the quantiser would not be exported independently but maybe it's possible to just write the codebooks to file, so that we could at least to the decoding part of the quantiser ourselves)

Mar 27 '24 17:03 giamic

@giamic Exactly, specifically, I was thinking there would be (following the above encode & decode functions):

encodec_encode.onnx that takes input_values (audio), returns encoded_frames of shape (nb_frames, batch_size, num_quantizers, chunk_length)
encodec_decode.onnx that takes audio_codes inputs ((1, batch_size, num_quantizers, chunk_length)), returns audio_values.

I think what you call "codebooks" is audio_codes? So that would be fine.

Mar 28 '24 10:03 fxmarty

optimum optimum copied to clipboard

Add support to export facebook encodec models to ONNX

Feature request

Motivation

Your contribution

optimum
optimum copied to clipboard