TensorRT-LLM
TensorRT-LLM copied to clipboard
llava batch infer, only the result corresponding to the longest prompt is correct, while other results are incorrect
version: TensorRT-LLM 0.10.0
the official script(TensorRT-LLM/examples/multimodal/run.py) use same prompt repeat to form a batch. but if I use different prompts to form a batch, the result is incorrect. how to solve it?
because the result corresponding to the longest prompt is correct, I think the reason is padding.
if i use the same prompts, the result is correct