snac icon indicating copy to clipboard operation
snac copied to clipboard

About VQ and the datasets

Open jiaweiru opened this issue 9 months ago • 0 comments

Hi, thanks a lot for the great work, the use of tokens with different resolutions is in line with the intuitive understanding of the audio signal (like a representation of steady state features and transient features). Here I have a question: doesn't the use of coarse tokens lead to longer latency, because lower sampling frequency tokens need to read in more buffer information.

Also, can I know on which datasets the open pre-training models are trained? Much appreciated.

jiaweiru avatar May 07 '24 08:05 jiaweiru