tensorrtllm_backend icon indicating copy to clipboard operation
tensorrtllm_backend copied to clipboard

request was blocked when gpt_model_type=inflight_fused_batching, serving baichuan model

Open burling opened this issue 1 year ago • 4 comments

Hello,

I am currently experiencing an issue with the triton-inference-server/tensorrt_backend while trying to run a Baichuan model.

Description

I have set gpt_model_type=inflight_fused_batching in my model configuration, but when I send a request to the server on port 8000, the request stays in processing indefinitely, with no output whatsoever.

Triton Information

I use the latest commit from main branch(e8ae70c583f8353a7dfebb1b424326a633b9360e). Here is my GPU device info:

image

To Reproduce

Steps to reproduce the behavior:

  1. Set gpt_model_type=inflight_fused_batching in model configuration.

  2. Send a request to the Triton server on port 8000.

  3. Observe that the request stays in processing with no output. image

  4. Some info may related using pstack image

I would expect the server to process the request.

Thank you for your help.

burling avatar Nov 27 '23 10:11 burling