请问要怎么输出论文里提到的48kHz音频？

Open anttxs opened this issue 9 months ago • 0 comments

论文里提到：

first converting semantic tokens into the Mel spectrogram via a Mel decoder, and then generating the audio with a high sampling rate of 48 kHz via a super-resolution neural vocoder.

但是README里提供的例子却是按照24000的采样率来保存音频的。

请问要怎么输出48kHz的音频呢？

Apr 11 '25 04:04 anttxs