transformers
transformers copied to clipboard

Published 20 hours ago •

Reame
Issues

Add Descript-Audio-Codec model

Open kamilakesbi opened this issue 8 months ago • 5 comments

What does this PR do?

This PR aims at adding Descript-Audio-Codec model, a high fidelity general neural audio codec, to the Transformers library.

This model is composed of 3 components:

An Encoder model.
A ResidualVectorQuantizer model, which is used with the encoder to obtain the audio quantized latent codes.
A Decoder model, used to reconstruct the audio after compression.

This is still a draft PR. Here's what I've done for now:

Adapted the model to Transformers format in modeling_dac.py.
Added the checkpoint conversion scripts, and pushed to the hub the 3 models here (16/24 and 44 khz).
Made sure the forward pass gives the same output as the original model
Added a Feature Extractor (very similar to the Encodec FeatureExtractor).
Started iterating on tests.

Who can review ?

cc @sanchit-gandhi and @ArthurZucker cc @ylacombe for visibility

Jun 19 '24 13:06 kamilakesbi