Implementation of Streaming-llm

Open Lu-kuan-lpk opened this issue 1 year ago • 1 comments

THX for the great work! I noticed the implementation of streaming-llm fixing the position of n_init tokens(https://github.com/thunlp/InfLLM/blob/main/inf_llm/attention/stream_llm.py#L69), while in the original paper of streaming-llm, it said the n_init tokens use the different positions, so does the implementation has some problem?

Sep 02 '24 08:09 Lu-kuan-lpk

Hi, it says "StreamingLLM focuses on positions within the cache rather than those in the original text" in section 3.2. And we implement this by, for all query tokens, placing the init tokens within the first n_init positions of their seen kv cache window.

Sep 03 '24 14:09 guyan364