TensorRT-LLM How to pass hidden_states to llm directly, when using inflight batching?

Is there any method to pass hidden_states to llm directly, when using inflight batching?

For example:

In multimodal case, the image feature embedding is done by vision_tower and projector.

Generally, we can pass these hidden_states with "prompt_table" param.

But it seems the "GenerationRequest" does not have a "prompt_table" attribute...

How to passing these image feature hidden_states to the llm?

Apr 23 '24 01:04 JoursBleu

We do not have support in the runtime for that at the moment. Is this something that could be handled inside the engine, @QiJune ?

Apr 24 '24 07:04 MartinMarciniszyn

We do not have support in the runtime for that at the moment. Is this something that could be handled inside the engine, @QiJune ?

@MartinMarciniszyn Is it support pass hidden_states when using python model_runner.py

May 15 '24 16:05 baby-care

It seems that prompt_table_path exsits in InferenceRequest , maybe you can have a look, i'll take a try recently.

https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/tensorrt_llm/pybind/batch_manager/inferenceRequest.cpp#L141

May 31 '24 06:05 littletomatodonkey