TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

result is different from 0.9.0 and 0.10.0,and speed has decreased when update version

Open sundayKK opened this issue 1 year ago • 1 comments
trafficstars

System Info

CPU X86 GPU A100 OS Redhat Driver 535.154.05

Who can help?

No response

Information

  • [X] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

i use the same model:vicuna-7b-v1.3 medusa-vicuna-7b-v1.3, when i use version of 0.9.0 with image: nvidia/cuda:12.1.0 , input ' Once upon' ,response and speed of output token like: 企业微信截图_17206069313952 but i update version to 0.10.0 and use image 12.4.0, response is changed and speed decreased. like: 企业微信截图_17206081811032 and i just use vllm to use the same model, and i can get the same response with version of 0.9.0, why update version the result has changed and speed decreased? thanks~ i noticed the differences between the two version is temperature,0.9.0 use tem=0.0 , 0.10.0 use tem=1.0

Expected behavior

update version ,speed should be imporved or remain consistent with old version. and model result should not changed.

actual behavior

update version ,result is different . and speed slowed down.

additional notes

as Reproduction

sundayKK avatar Jul 10 '24 10:07 sundayKK

I see the same issue with Llama-3 70B, v0.10.0 engine runs 0.5-1.5 seconds slower than the same engine in v0.9.0.

ghost avatar Jul 16 '24 05:07 ghost

@sundayKK, please try to use the latest version of TrtLLM.

hello-11 avatar Nov 14 '24 05:11 hello-11