transformers
transformers copied to clipboard
Add Descript-Audio-Codec model
What does this PR do?
This PR aims at adding Descript-Audio-Codec model, a high fidelity general neural audio codec, to the Transformers library.
This model is composed of 3 components:
- An Encoder model.
- A ResidualVectorQuantizer model, which is used with the encoder to obtain the audio quantized latent codes.
- A Decoder model, used to reconstruct the audio after compression.
This is still a draft PR. Here's what I've done for now:
- Adapted the model to Transformers format in
modeling_dac.py
. - Added the checkpoint conversion scripts, and pushed to the hub the 3 models here (16/24 and 44 khz).
- Made sure the forward pass gives the same output as the original model
- Added a Feature Extractor (very similar to the Encodec FeatureExtractor).
- Started iterating on tests.
Who can review ?
cc @sanchit-gandhi and @ArthurZucker cc @ylacombe for visibility