fish-speech icon indicating copy to clipboard operation
fish-speech copied to clipboard

Which will be better? with indices or with hiddens?

Open JohnHerry opened this issue 6 months ago • 0 comments

Hi, Is there any experiments about LLM training speech input? there are two kind of inputs: the indices of codebook in codec, as a singel integer value, or the indexed cluster center of codebook as a vector. Is there any study to say which one can better fit the AutoRegressive LLM model training?

JohnHerry avatar Aug 22 '24 09:08 JohnHerry