howitry

Results 3 comments of howitry

推理阶段的hubert+vq不是用text+参考音频的离散ssl自回归预测出来的吗?推理阶段生成出来的hubert+vq不应该就想包含参考音色吗,为什么会少音色泄露?

> we use speech tokenizer, which means we must use flow model to reconstruct the mel sequence I understand that flow is used to transform code to mel. But the...

> > When using the 40 tokens/s configuration, although the quality of the reconstructed audio is very good, there are often some mispronunciations. Have you measured the CER performance of...