StyleTTS2 TensorRT Optimization

Is there any TensorRT optimised inference available?

Nov 06 '24 14:11 nityanandmathur

No

Nov 10 '24 08:11 78Alpha

StyleTTS2 consists of several models and components that work together to generate audio. To optimize it using TensorRT, you first need to convert each model separately from PyTorch to ONNX.

Once converted, you can either:

Run the ONNX models using ONNX Runtime with the TensorRT execution provider, or
Convert the ONNX models directly into TensorRT format and perform inference using TensorRT's Python or C++ API.

From my experience with ablation studies, the decoder is the most resource-intensive component in StyleTTS2. If you aim for partial optimization, converting just the decoder from PyTorch to ONNX and running it in TensorRT can provide significant speed improvements. Alternatively, converting all models to ONNX and running them in ONNX Runtime with the TensorRT execution provider will also yield noticeable performance gains. This approach is feasible and I have tested it.

Dec 12 '24 08:12 UmerrAhsan

HI @UmerrAhsan. Could you please share the overall latencies of your ONNX model?

Dec 12 '24 11:12 nityanandmathur

Hi @nityanandmathur. I have ran the decoder model and predictor.text encoder model in tensorrt. It decreases my latency by over 50%. Also I have cached the style vectors from diffusion and style encoder before. After that, a single short sentence runs in under 100ms.

Dec 12 '24 12:12 UmerrAhsan

@UmerrAhsan Are you open to do a contract job about optimize StyleTTS2 using TensorRT? If so, Could we connect over email? shank187 at gmail dot com

May 29 '25 10:05 moesaeed

@UmerrAhsan - Could you (or anyone else) share an example of how to load and run the ONNX model using C++? Thank you :)

Oct 22 '25 14:10 anadeprado4