TensorRT-LLM How to do clean model.forward() with tensor input and tensor output with TensorRT-LLM?

similar issue here: https://github.com/NVIDIA/TensorRT-LLM/issues/158

The ModelRunner defined here seems to only have a generate function, which includes so many complicated operations defined here: https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/runtime/generation.py , with no clean model.forward()

What if I simply want to use TensorRT-LLM to do a simple model forward, with simple input tensor, and get simple tensor output logits? Just like this

Feb 01 '24 12:02 brisker

We don't have such example now. You could get logit here https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/runtime/generation.py#L2238-L2261.

Feb 02 '24 08:02 byshiue

Hi @brisker, please try model unit test here https://github.com/NVIDIA/TensorRT-LLM/blob/main/tests/model/test_llama.py. I think this probably can meet your needs. @byshiue You can assign thsi issue to me.

Feb 02 '24 09:02 StudyingShao

@StudyingShao there are three forward in the file you mentioned.

first : https://github.com/NVIDIA/TensorRT-LLM/blob/main/tests/model/test_llama.py#L114 --------this is just for building trt-network, not real model.forward

the rest two: https://github.com/NVIDIA/TensorRT-LLM/blob/main/tests/model/test_llama.py#L301 https://github.com/NVIDIA/TensorRT-LLM/blob/main/tests/model/test_llama.py#L381 ------ these two are just huggingface model forward, not trt-llm-engine forward

Feb 04 '24 03:02 brisker

Hi @brisker , Tensorrt-LLM uses Tensorrt as inferencing framework. The usage differs from PyTorch which is used by HuggingFace. There is no explicit model.forward() in Tensorrt-LLM, but you can utilize https://github.com/NVIDIA/TensorRT-LLM/blob/main/tests/model/test_llama.py#L360-L365 to achieve the same functionality.

Feb 11 '24 18:02 StudyingShao

Hi @brisker, does this solve your problem? Please close the issue if there is no more question. :)

Mar 06 '24 12:03 StudyingShao