sherpa-onnx
sherpa-onnx copied to clipboard
[Help wanted] Support TensorRT
TODO
- [ ] Support GPU via TensorRT
See https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html
I would like take on this.
- [ ] Support the Onnxruntime CUDA provider.
Hi @csukuangfj , @yuekaizhang
Observed that currently only CUDA EP support is there and TensorRT EP support is not there for onnxruntime. is there ay active developments going on for TensorRT GPU backend?
is there ay active developments going on for TensorRT GPU backend?
We don't have a plan to support it in the near future. Would you like to contribute?
I tried adding triggering onnxruntime's tensorrt ep for zipfromer but the model performance was very bad, debugging further with standalone onnxruntime in python for Encoder models, will update if I see some good results.
Hi @csukuangfj, TensorRT has several parameters, and these will be only valid if TensorRT provider is chosen, so I need your suggestion on either of below 2.
- Putting TRT configs as part of the model-config.cc file model-config.cc
- Creating a new config for TRT and exposing the required parameters from it.
Thank you
Could you create a new config for tensorrt and add this config as a member field of OnlineModelConfig and OfflineModelConfig?
You can set the default values of this config as the one used in https://github.com/k2-fsa/sherpa-onnx/blob/b7148174739275dfc997af726be364245511239c/sherpa-onnx/csrc/session.cc#L137-L150
yes, I will send the PR for configs separately in some time.
Current perf Trt Vs Cuda
Tensorrt csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 1.930044 ms csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.034984 ms csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.034912 ms csrc/online-websocket-server-impl.cc:Run:256 Warm up completed : 3 times. csrc/online-websocket-server.cc:main:79 Started! csrc/online-websocket-server.cc:main:80 Listening on: 6007 csrc/online-websocket-server.cc:main:81 Number of work threads: 8
Cuda csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.535651 ms csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.187492 ms csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.187698 ms
Apart from this, with Trt there is a huge session creation time. which is expected, only way to handle is to cache the engine images.
Current perf Cuda vs Trt
csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 1.930044 ms csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.034984 ms csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.034912 ms csrc/online-websocket-server-impl.cc:Run:256 Warm up completed : 3 times. csrc/online-websocket-server.cc:main:79 Started! csrc/online-websocket-server.cc:main:80 Listening on: 6007 csrc/online-websocket-server.cc:main:81 Number of work threads: 8
csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.535651 ms csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.187492 ms csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.187698 ms
Apart from this, with Trt there is a huge session creation time. which is expected, only way to handle is to cache the engine images.
May I know the results for CPU provider if you have? Also, could you explain why there are three lines for each block? e.g. 0.535651 ms 0.187492 ms 0.187698 ms. @manickavela29
I can try to get for CPU numbers, but i don't have any high performance CPU,
(in between someone can add support for dnnl ep 🙂)
But here the focus itself is towards GPU with Cuda Vs Trt, is CPU benchmarking relevant?
Code blocks are just performance log which I added for zipformer. Those are not part of the patch
Hi @csukuangfj https://github.com/k2-fsa/sherpa-onnx/pull/992
will create configs for execution provider all together and integrate it with sessions. let me know if you have any other thoughts still WIP