tensorrtllm_backend icon indicating copy to clipboard operation
tensorrtllm_backend copied to clipboard

Qwen2-14B inference garbled

Open kazyun opened this issue 1 year ago • 2 comments

System Info

When using Qwen2, executing inference with the engine through the run.py script outputs normally. However, when using Triton for inference, some characters appear garbled, and the output is incomplete compared to the results obtained from using the script. What could be the cause of this issue?

maybe the config.pbtxt cause the problem

Who can help?

No response

Information

  • [ ] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

  1. start triton server

Expected behavior

get the same results with run.py script

actual behavior

When using Qwen2, executing inference with the engine through the run.py script outputs normally. However, when using Triton for inference, some characters appear garbled, and the output is incomplete compared to the results obtained from using the script. What could be the cause of this issue?

additional notes

no

kazyun avatar Sep 20 '24 03:09 kazyun