Amphion [Feature]: speed up maskgct

Is your feature request related to a problem? Please describe.

(A clear and concise description of what the problem is.) when I generate 20s audio, the time cost is about 1 minutes， which cannot meet my requirements

Describe the solution you'd like

(A clear and concise description of what you want to happen.) expect that the cost time reduced to 10s

Describe alternatives you've considered

(A clear and concise description of any alternative solutions or features you've considered.)

Additional context

(Add any other context or screenshots about the feature request here.)

Nov 20 '24 06:11 hjc3613

OMG! The time is 1200S when I ran the maskgct demo. 1732357435441

Nov 23 '24 10:11 SunnyTian

We are actively developing a faster version together with more fast tts models, thanks for the attention and we'll release them soon before new year

Nov 24 '24 03:11 jiaqili3

OMG! The time is 1200S when I ran the maskgct demo.

Could you kindly check that you're using gpu for inference, otherwise it'll be very slow. Thanks!

Nov 24 '24 03:11 jiaqili3

OMG! The time is 1200S when I ran the maskgct demo.

Could you kindly check that you're using gpu for inference, otherwise it'll be very slow. Thanks!

yes,I can see CUDA is available and the usage of GPU is 100%,but it seems that the GPU only works after a long time of the "maskgct_inference" function execution

Nov 26 '24 14:11 SunnyTian

是的，我可以看到 CUDA 可用并且 GPU 的使用率是 100%，但似乎 GPU 只有在长时间的“maskgct_inference”函数执行后才能工作

You should be running for the first time, and the first run will be a short pre-training model that will be downloaded independently, so it takes time

Dec 06 '24 07:12 wen0320