TensorRT-LLM llava batch infer, only the result corresponding to the longest prompt is correct, while other results are incorrect

llava batch infer, only the result corresponding to the longest prompt is correct, while other results are incorrect

Open lss15151161 opened this issue 8 months ago • 3 comments

version: TensorRT-LLM 0.10.0 the official script(TensorRT-LLM/examples/multimodal/run.py) use same prompt repeat to form a batch. but if I use different prompts to form a batch, the result is incorrect. how to solve it? because the result corresponding to the longest prompt is correct, I think the reason is padding.

if i use the same prompts, the result is correct

Jul 03 '24 03:07 lss15151161

TensorRT-LLM TensorRT-LLM copied to clipboard

llava batch infer, only the result corresponding to the longest prompt is correct, while other results are incorrect

TensorRT-LLM
TensorRT-LLM copied to clipboard