TensorRT-LLM
TensorRT-LLM copied to clipboard
[Question] Why delete q_b_scale kv_b_scale k_b_trans_scale
Why did the gpt_attention function delete the parameters
q_b_scale: Optional[Tensor] = None,
kv_b_scale: Optional[Tensor] = None,
k_b_trans_scale: Optional[Tensor] = None,
and is_fp8_model_flag in the latest code?
Can anyone explain the reason?