TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

int8_kv_cache scale

Open liquanfeng opened this issue 1 year ago • 1 comments

https://github.com/NVIDIA/TensorRT-LLM/blob/89ba1b1a67d570e41b03da87e5518eaff0d31fbf/tensorrt_llm/models/llama/convert.py#L757

I'm puzzled as to why the act_range of q_proj is being calculated in the scale for int8_kv_cache? Because the scale is only used to quantify the output of k_proj and v_proj.

liquanfeng avatar May 07 '24 18:05 liquanfeng

Reassigning to @thorjohnsen

poweiw avatar May 16 '25 21:05 poweiw

@liquanfeng , Thanks for catching that! Although the code is still present here, it appears to be specific to the old TensorRT backend. The PyTorch backend, which is the preferred one, seems to be okay, here.

karljang avatar Oct 21 '25 06:10 karljang