VoiceCraft Few questions about the paper. [Encodec;inference speed; model parameters]

Few questions about the paper. [Encodec;inference speed; model parameters]

Open taras-sereda opened this issue 1 year ago • 2 comments

Hi @jasonppy , great work and samples, thanks for sharing the code!

Introduction of causal masking for TTS - is an elegant approach for contextualization. Bravo!

I'm curious about few aspects of your work at the moment:

Did you train Encodec as well? To my knowledge the parameters are released too. But looking into your code, it seams that you trained it too. Now I wonder what might be a reason for this. A hypothesis: no parameters for 16 kHz sampling rate?
When it comes to inference you mention that you run it multiple times. Can you share inference speed for say 10 seconds long utterance on 820M model?
Is there any estimate when model parameters will be released?

Have a good one! Best, Taras

Mar 22 '24 17:03 taras-sereda

Thanks!

Did you train Encodec as well? To my knowledge the parameters are released too. But looking into your code, it seams that you trained it too. Now I wonder what might be a reason for this. A hypothesis: no parameters for 16 kHz sampling rate?

Yes we trained Encodec as well. We will also open source the trained encodec

When it comes to inference you mention that you run it multiple times. Can you share inference speed for say 10 seconds long utterance on 820M model?

for the 830M model, the generation time is faster than real time for 10 seconds long utterance on a A40 GPU, more details will be added to the camera-ready paper

Is there any estimate when model parameters will be released?

Model parameters will be released by the end of this month.

Mar 22 '24 19:03 jasonppy

Thanks, looking forwards to seem more details!

Mar 23 '24 12:03 taras-sereda

@jasonppy hi, can you explain why you chose to retrain encodec instead of using the released model? is 8 codebooks too much?

Jun 16 '24 16:06 thivux

VoiceCraft VoiceCraft copied to clipboard

Few questions about the paper. [Encodec;inference speed; model parameters]

VoiceCraft
VoiceCraft copied to clipboard