BentoML icon indicating copy to clipboard operation
BentoML copied to clipboard

Support for TensorRT

Open Matthieu-Tinycoaching opened this issue 4 years ago • 7 comments

Regarding high performance of transformer models with TensorRT, would it be envisageable to adapt bentoML to work with?

Matthieu-Tinycoaching avatar Jul 22 '21 18:07 Matthieu-Tinycoaching

I assume this is for docker container?

aarnphm avatar Jul 25 '21 10:07 aarnphm

Yes for sure.

Matthieu-Tinycoaching avatar Jul 25 '21 14:07 Matthieu-Tinycoaching

you can build it on top of the Docker GPU image we provided and work from there 😄

aarnphm avatar Jul 25 '21 15:07 aarnphm

Hi @aarnphm sorry I don't really know what includes tensorRT. Doesn't it correspond to a framework/runtime as ON X?

Matthieu-Tinycoaching avatar Jul 25 '21 15:07 Matthieu-Tinycoaching

+1 on tensorrt support. Benchmarks against Triton would also be great!

yaysummeriscoming avatar Aug 25 '21 10:08 yaysummeriscoming

we will consider this integration after bentoml 1.0

aarnphm avatar Aug 25 '21 13:08 aarnphm

Hello! Do we have any updates on the TensorRT integration? First-class TensorRT support for model serving seems to be a very useful feature.

hxu296 avatar Oct 24 '22 19:10 hxu296

This is supported in BentoML 1.2 now! An example project is coming soon!

parano avatar Mar 04 '24 17:03 parano