TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

[Question] Why delete q_b_scale kv_b_scale k_b_trans_scale

Open bobbych94 opened this issue 9 months ago • 0 comments

Why did the gpt_attention function delete the parameters

q_b_scale: Optional[Tensor] = None,
kv_b_scale: Optional[Tensor] = None,
k_b_trans_scale: Optional[Tensor] = None,

and is_fp8_model_flag in the latest code? Can anyone explain the reason?

bobbych94 avatar Mar 21 '25 04:03 bobbych94