UmerrAhsan comments

Results 7 comments of


                                            UmerrAhsan

TensorRT Optimization

StyleTTS2 consists of several models and components that work together to generate audio. To optimize it using TensorRT, you first need to convert each model separately from PyTorch to ONNX....

TensorRT Optimization

Hi @nityanandmathur. I have ran the decoder model and predictor.text encoder model in tensorrt. It decreases my latency by over 50%. Also I have cached the style vectors from diffusion...

Audio Length Customization

The 300-second limit can be bypassed by addressing the underlying constraint of BERT's maximum token length of 512. You can tokenize your text into sentences and process them one by...

How to add my own voice to the software

Yes, you can clone your voice by providing a sample of your voice audio as a reference audio, but the results may not be very impressive. For better quality, you...

Speed difference for longer input text

Latency generally increases as the length of the input sentence grows. However, a slowdown for short sentences is not typical and might indicate an issue. I've worked with StyleTTS2 and...

Speed difference for longer input text

Hi @Ananya21162, Without seeing the code, I can't say much, but what I would suggest is to perform an inner ablation study. Print the time taken for each component during...

Inference latency

it’s unusual that fine-tuning StyleTTS2 increases the checkpoint file size, even though the number of parameters in the model remains the same. Has anyone identified the reason behind this size...