TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

use selected index past past key value in attention when using contin…

Open Eayne opened this issue 11 months ago • 1 comments

when using continuous kv cache, gpt_attention will only use first past_key_value instead of past_key_value[selected_indexed]. It will cause calculating result errors when the values of continous kv caches are not zeros.

Eayne avatar Jan 13 '25 05:01 Eayne

@Eayne

Hi, since TensorRT-LLM becomes github firstly since last Monday, pls refresh your MR based on the latest main if you still want to contribute this.

Thanks June

juney-nvidia avatar Mar 28 '25 11:03 juney-nvidia

Closing since no response after https://github.com/NVIDIA/TensorRT-LLM/pull/2682#issuecomment-2761036095. Feel free to reopen! @Eayne

poweiw avatar May 28 '25 05:05 poweiw