descript-audio-codec Details on training and inferencing the 16kHz variant

Hi, there! Thanks for the great work. I noticed in your paper that all audio samples are resampled to 44.1kHz, which I understand is the setting for the 44kHz variant.

I was wondering if you could kindly provide some clarification regarding the 16kHz version. Are the audio samples resampled to 16kHz for training purposes? Or it's just input 16kHz and then resampled to 44kHz in the inference stage?

Some examples on how to train and use these variants could be a great help.

Aug 15 '23 10:08 JinchaoLove

This is the config for training the 16 KHz model https://github.com/descriptinc/descript-audio-codec/blob/main/conf/final/16khz.yml

The input is always resampled to 16 KHz and output will always match the input sampling rate. The documentation should contain commands to launch training already

Aug 19 '23 16:08 ritheshkumar95

This is the config for training the 16 KHz model https://github.com/descriptinc/descript-audio-codec/blob/main/conf/final/16khz.yml

The input is always resampled to 16 KHz and output will always match the input sampling rate. The documentation should contain commands to launch training already

I see, thanks a lot!👍

Aug 21 '23 05:08 JinchaoLove