use selected index past past key value in attention when using contin…

Open Eayne opened this issue 11 months ago • 1 comments

when using continuous kv cache, gpt_attention will only use first past_key_value instead of past_key_value[selected_indexed]. It will cause calculating result errors when the values of continous kv caches are not zeros.

Jan 13 '25 05:01 Eayne

@Eayne

Hi, since TensorRT-LLM becomes github firstly since last Monday, pls refresh your MR based on the latest main if you still want to contribute this.

Thanks June

Mar 28 '25 11:03 juney-nvidia

Closing since no response after https://github.com/NVIDIA/TensorRT-LLM/pull/2682#issuecomment-2761036095. Feel free to reopen! @Eayne

May 28 '25 05:05 poweiw