TensorRT-LLM
TensorRT-LLM copied to clipboard
H20 infer Qwen14B fp8 results in a divide-by-zero error.
System Info Device: H20 Driver: 550.90.07
python env: nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 tensorrt 10.0.1 tensorrt-cu12-bindings 10.0.1 tensorrt-cu12-libs 10.0.1 tensorrt-llm 0.12.0.dev2024071600
model
Qwen14B: https://huggingface.co/Qwen/Qwen-14B
expected behavior
Expect to print performance and eval data normally.
actual behavior