TensorRT-LLM Using TensortLLM/TensorRT to compile custom model based on Mistral LLM and ViT image encoder

Using TensortLLM/TensorRT to compile custom model based on Mistral LLM and ViT image encoder

Open anjali-chadha opened this issue 10 months ago • 2 comments

I have a model that combines two components:

Image Encoder: Based on the ViT-G/14 vision transformer model.
Language Model: A Mistral-based large language model (LLM).

At a higher level, the output from the Image Encoder is processed, concatenated with other tokens, and then fed into the Mistral LLM model.

In my current implementation, I have a single class that initializes these models and performs a forward pass on them.

For running inference on this model, I'm exploring TensorRT and TensorRT-LLM. It seems like these components can be individually compiled—Mistral is supported in TensorRT-LLM, and ViT-G can be compiled using TensorRT.

My question is: How can I leverage both TensorRT and TensorRT-LLM to run inference on this custom vision-language architecture? Specifically:

Is it possible to compile and optimize the two components (ViT-G and Mistral) separately using their respective tools (TensorRT and TensorRT-LLM)? If so, how can I combine the optimized components during inference to run the entire vision-language model pipeline efficiently? Any guidance or examples on this would be greatly appreciated. Thank you!

Apr 04 '24 10:04 anjali-chadha

TensorRT-LLM TensorRT-LLM copied to clipboard

Using TensortLLM/TensorRT to compile custom model based on Mistral LLM and ViT image encoder

TensorRT-LLM
TensorRT-LLM copied to clipboard