opencompass
opencompass copied to clipboard
[Feature] Add support for TensorRT-LLM inference engine
Describe the feature
Hi guys,
The TensorRT-LLM has been released last week. It was maintained by NVIDIA with high inference performance. Link: https://github.com/NVIDIA/TensorRT-LLM
Will implement it by API calling or just integrate it into inference pipeline just like huggingface inference method? Which method is better?
Thanks
Will you implement it?
- [ ] I would like to implement this feature and create a PR!