How to do clean model.forward() with tensor input and tensor output with TensorRT-LLM?
similar issue here: https://github.com/NVIDIA/TensorRT-LLM/issues/158
The ModelRunner defined here seems to only have a generate function, which includes so many complicated operations defined here: https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/runtime/generation.py , with no clean model.forward()
What if I simply want to use TensorRT-LLM to do a simple model forward, with simple input tensor, and get simple tensor output logits? Just like this
We don't have such example now. You could get logit here https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/runtime/generation.py#L2238-L2261.
Hi @brisker, please try model unit test here https://github.com/NVIDIA/TensorRT-LLM/blob/main/tests/model/test_llama.py. I think this probably can meet your needs. @byshiue You can assign thsi issue to me.
@StudyingShao there are three forward in the file you mentioned.
first : https://github.com/NVIDIA/TensorRT-LLM/blob/main/tests/model/test_llama.py#L114 --------this is just for building trt-network, not real model.forward
the rest two: https://github.com/NVIDIA/TensorRT-LLM/blob/main/tests/model/test_llama.py#L301 https://github.com/NVIDIA/TensorRT-LLM/blob/main/tests/model/test_llama.py#L381 ------ these two are just huggingface model forward, not trt-llm-engine forward
Hi @brisker , Tensorrt-LLM uses Tensorrt as inferencing framework. The usage differs from PyTorch which is used by HuggingFace. There is no explicit model.forward() in Tensorrt-LLM, but you can utilize https://github.com/NVIDIA/TensorRT-LLM/blob/main/tests/model/test_llama.py#L360-L365 to achieve the same functionality.
Hi @brisker, does this solve your problem? Please close the issue if there is no more question. :)