transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Add Descript-Audio-Codec model

Open kamilakesbi opened this issue 8 months ago • 5 comments

What does this PR do?

This PR aims at adding Descript-Audio-Codec model, a high fidelity general neural audio codec, to the Transformers library.

This model is composed of 3 components:

  • An Encoder model.
  • A ResidualVectorQuantizer model, which is used with the encoder to obtain the audio quantized latent codes.
  • A Decoder model, used to reconstruct the audio after compression.

This is still a draft PR. Here's what I've done for now:

  1. Adapted the model to Transformers format in modeling_dac.py.
  2. Added the checkpoint conversion scripts, and pushed to the hub the 3 models here (16/24 and 44 khz).
  3. Made sure the forward pass gives the same output as the original model
  4. Added a Feature Extractor (very similar to the Encodec FeatureExtractor).
  5. Started iterating on tests.

Who can review ?

cc @sanchit-gandhi and @ArthurZucker cc @ylacombe for visibility

kamilakesbi avatar Jun 19 '24 13:06 kamilakesbi