FireRedTTS
FireRedTTS copied to clipboard
请问要怎么输出论文里提到的48kHz音频?
论文里提到:
first converting semantic tokens into the Mel spectrogram via a Mel decoder, and then generating the audio with a high sampling rate of 48 kHz via a super-resolution neural vocoder.
但是README里提供的例子却是按照24000的采样率来保存音频的。
请问要怎么输出48kHz的音频呢?