Details on training and inferencing the 16kHz variant
Hi, there! Thanks for the great work. I noticed in your paper that all audio samples are resampled to 44.1kHz, which I understand is the setting for the 44kHz variant.
I was wondering if you could kindly provide some clarification regarding the 16kHz version. Are the audio samples resampled to 16kHz for training purposes? Or it's just input 16kHz and then resampled to 44kHz in the inference stage?
Some examples on how to train and use these variants could be a great help.
This is the config for training the 16 KHz model https://github.com/descriptinc/descript-audio-codec/blob/main/conf/final/16khz.yml
The input is always resampled to 16 KHz and output will always match the input sampling rate. The documentation should contain commands to launch training already
This is the config for training the 16 KHz model https://github.com/descriptinc/descript-audio-codec/blob/main/conf/final/16khz.yml
The input is always resampled to 16 KHz and output will always match the input sampling rate. The documentation should contain commands to launch training already
I see, thanks a lot!👍