TensorRT-LLM H20 infer Qwen14B fp8 results in a divide-by-zero error.

H20 infer Qwen14B fp8 results in a divide-by-zero error.

Open menggeliu1205 opened this issue 6 months ago • 1 comments

System Info Device: H20 Driver: 550.90.07

python env: nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 tensorrt 10.0.1 tensorrt-cu12-bindings 10.0.1 tensorrt-cu12-libs 10.0.1 tensorrt-llm 0.12.0.dev2024071600

model Qwen14B: https://huggingface.co/Qwen/Qwen-14B expected behavior Expect to print performance and eval data normally. actual behavior 截屏2024-07-26 15 01 33 截屏2024-07-26 15 01 42

Jul 26 '24 07:07 menggeliu1205

TensorRT-LLM TensorRT-LLM copied to clipboard

H20 infer Qwen14B fp8 results in a divide-by-zero error.

TensorRT-LLM
TensorRT-LLM copied to clipboard