Possible to convert EncDecRNNTBPEModel into TensorRT and run them through Triton inference server?
Is your feature request related to a problem? Please describe.
I wonder how to efficiently host the parakeet-tdt-0.6b-v2 model through the Triton inference server. I can use the python backend to run the model like here. I wonder if there are any docs guiding me how to convert the model to TensorRT engines and run through the Triton inference server. I appreciate your help!
Describe the solution you'd like
Convert the parakeet-tdt-0.6b-v2 model into TensorRT engines, and run it through Triton inference server.
Describe alternatives you've considered
Run the model as PyTorch model, which is quite fast already. I just want to run it even faster.
You can export the Fast-conformer encoder to TRT (via ONNX). TensorRT doesn't support transducer (RNNT/TDT) and someone would have to implement it via a plugin in TRT.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.
@anand-nv any update on this ?
can we now fully convert Parakeet to TRT or does it still not support the transducer (RNNT/TDT). would be happy to submit a PR for TRT to support this