UmerrAhsan

Results 7 comments of UmerrAhsan

StyleTTS2 consists of several models and components that work together to generate audio. To optimize it using TensorRT, you first need to convert each model separately from PyTorch to ONNX....

Hi @nityanandmathur. I have ran the decoder model and predictor.text encoder model in tensorrt. It decreases my latency by over 50%. Also I have cached the style vectors from diffusion...

The 300-second limit can be bypassed by addressing the underlying constraint of BERT's maximum token length of 512. You can tokenize your text into sentences and process them one by...

Yes, you can clone your voice by providing a sample of your voice audio as a reference audio, but the results may not be very impressive. For better quality, you...

Latency generally increases as the length of the input sentence grows. However, a slowdown for short sentences is not typical and might indicate an issue. I've worked with StyleTTS2 and...

Hi @Ananya21162, Without seeing the code, I can't say much, but what I would suggest is to perform an inner ablation study. Print the time taken for each component during...

it’s unusual that fine-tuning StyleTTS2 increases the checkpoint file size, even though the number of parameters in the model remains the same. Has anyone identified the reason behind this size...