NeMo Possible to convert EncDecRNNTBPEModel into TensorRT and run them through Triton inference server?

Is your feature request related to a problem? Please describe.

I wonder how to efficiently host the parakeet-tdt-0.6b-v2 model through the Triton inference server. I can use the python backend to run the model like here. I wonder if there are any docs guiding me how to convert the model to TensorRT engines and run through the Triton inference server. I appreciate your help!

Describe the solution you'd like

Convert the parakeet-tdt-0.6b-v2 model into TensorRT engines, and run it through Triton inference server.

Describe alternatives you've considered

Run the model as PyTorch model, which is quite fast already. I just want to run it even faster.

May 06 '25 06:05 jingzhaoo

You can export the Fast-conformer encoder to TRT (via ONNX). TensorRT doesn't support transducer (RNNT/TDT) and someone would have to implement it via a plugin in TRT.

May 08 '25 13:05 anand-nv

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Jun 08 '25 02:06 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

Jun 15 '25 02:06 github-actions[bot]

@anand-nv any update on this ?

can we now fully convert Parakeet to TRT or does it still not support the transducer (RNNT/TDT). would be happy to submit a PR for TRT to support this

Oct 24 '25 04:10 StephennFernandes