BentoML Support for TensorRT

Support for TensorRT

Open Matthieu-Tinycoaching opened this issue 4 years ago • 7 comments

Regarding high performance of transformer models with TensorRT, would it be envisageable to adapt bentoML to work with?

Jul 22 '21 18:07 Matthieu-Tinycoaching

I assume this is for docker container?

Jul 25 '21 10:07 aarnphm

Yes for sure.

Jul 25 '21 14:07 Matthieu-Tinycoaching

you can build it on top of the Docker GPU image we provided and work from there 😄

Jul 25 '21 15:07 aarnphm

Hi @aarnphm sorry I don't really know what includes tensorRT. Doesn't it correspond to a framework/runtime as ON X?

Jul 25 '21 15:07 Matthieu-Tinycoaching

Jul 26 '21 01:07 aarnphm

+1 on tensorrt support. Benchmarks against Triton would also be great!

Aug 25 '21 10:08 yaysummeriscoming

we will consider this integration after bentoml 1.0

Aug 25 '21 13:08 aarnphm

Hello! Do we have any updates on the TensorRT integration? First-class TensorRT support for model serving seems to be a very useful feature.

Oct 24 '22 19:10 hxu296

This is supported in BentoML 1.2 now! An example project is coming soon!

Mar 04 '24 17:03 parano