TensorRT-LLM
TensorRT-LLM copied to clipboard
Using TensortLLM/TensorRT to compile custom model based on Mistral LLM and ViT image encoder
I have a model that combines two components:
- Image Encoder: Based on the ViT-G/14 vision transformer model.
- Language Model: A Mistral-based large language model (LLM).
At a higher level, the output from the Image Encoder is processed, concatenated with other tokens, and then fed into the Mistral LLM model.
In my current implementation, I have a single class that initializes these models and performs a forward pass on them.
For running inference on this model, I'm exploring TensorRT and TensorRT-LLM. It seems like these components can be individually compiled—Mistral is supported in TensorRT-LLM, and ViT-G can be compiled using TensorRT.
My question is: How can I leverage both TensorRT and TensorRT-LLM to run inference on this custom vision-language architecture? Specifically:
Is it possible to compile and optimize the two components (ViT-G and Mistral) separately using their respective tools (TensorRT and TensorRT-LLM)? If so, how can I combine the optimized components during inference to run the entire vision-language model pipeline efficiently? Any guidance or examples on this would be greatly appreciated. Thank you!