TensorRT-LLM QwenVL visual

QwenVL visual_encoder failure

Open peytontolbert opened this issue 1 year ago • 0 comments

System Info

[TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024020600[02/16/2024-22:04:57] [TRT-LLM] [I] Loading engine from ./plan/visual_encoder/visual_encoder_fp16.plan [02/16/2024-22:05:00] [TRT-LLM] [I] Creating session from engine ./plan/visual_encoder/visual_encoder_fp16.plan [02/16/2024-22:05:00] [TRT] [I] Loaded engine size: 3714 MiB [02/16/2024-22:05:00] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +3699, now: CPU 0, GPU 3699 (MiB) [02/16/2024-22:05:00] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +190, now: CPU 0, GPU 3889 (MiB) [02/16/2024-22:05:00] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) Traceback (most recent call last): File "/home/kye/TensorRT-LLM/examples/qwenvl/run.py", line 481, in image_embeds = vit_process(args.input_dir, args.vit_engine_dir, stream) File "/home/kye/TensorRT-LLM/examples/qwenvl/run.py", line 454, in vit_process visual_output_info = session_vit.infer_shapes( File "/home/kye/.local/lib/python3.10/site-packages/tensorrt_llm/runtime/session.py", line 184, in infer_shapes raise RuntimeError( RuntimeError: Could not set shape torch.Size([3, 3, 224, 224]) for tensor input. Please check the profile range for which your model was build.

I've tried downloading and following the instructions and I know qwen is 448, I'm not sure why it's forcing the example run.py image to 224 which is probably causing the error

Who can help?

No response

Information

[X] The official example scripts
[x] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

follow qwenvl example instructions to build and then run

Expected behavior

shape probably to be 448, i need the image embeddings from the visual encoder that onnx compiled

actual behavior

shape mismatch, could not set shape 224 when it needs to be 448

additional notes

I've narrowed this down to the visual encoder in vit_process,

def vit_process(image_path, engine_dir, stream): vit_path = os.path.join(engine_dir, 'visual_encoder/visual_encoder_fp16.plan') logger.info(f'Loading engine from {vit_path}') with open(vit_path, 'rb') as f: engine_buffer = f.read() logger.info(f'Creating session from engine {vit_path}') session_vit = Session.from_serialized_engine(engine_buffer) device = torch.device("cuda") if torch.cuda.is_available() else "cpu" images_list = [] for img in image_path: for v in img.values(): image = torch.load(v) if image.device.type == 'cpu': image = image.to(device) images_list.append(image) images = torch.cat(images_list) batch_size = images.size(0) images = images.expand(batch_size, -1, -1, -1).contiguous() visual_inputs = {'input': images.float()} visual_output_info = session_vit.infer_shapes( [TensorInfo('input', trt.DataType.FLOAT, images.shape)]) visual_outputs = { t.name: torch.empty(tuple(t.shape), dtype=trt_dtype_to_torch(t.dtype), device='cuda') for t in visual_output_info } profiler.start("ViT")

run_time = 1
for _ in range(run_time):
    ok = session_vit.run(visual_inputs, visual_outputs, stream)
profiler.stop("ViT")
Vit_time = profiler.elapsed_time_in_sec("ViT") / run_time
logger.info(f'TensorRT-LLM ViT latency: {Vit_time} sec ')

assert ok, "Runtime execution failed for vit session"

image_embeds = visual_outputs['output']
return image_embeds

if name == 'main': args = parse_arguments() stream = torch.cuda.current_stream().cuda_stream tensorrt_llm.logger.set_level(args.log_level) image_embeds = vit_process(args.input_dir, args.vit_engine_dir, stream) qinfer = QWenInfer(args.tokenizer_dir, args.qwen_engine_dir, args.log_level, args.output_csv, args.output_npy, args.num_beams) qinfer.qwen_model_init() qinfer.qwen_infer(image_embeds, args.images_path, args.input_text, args.max_new_tokens, history=[])

Feb 16 '24 22:02 peytontolbert

TensorRT-LLM TensorRT-LLM copied to clipboard

QwenVL visual_encoder failure

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

TensorRT-LLM
TensorRT-LLM copied to clipboard