jq-wei

Results 1 comments of jq-wei

Especially, after prefilling, there is one attention loop for seq_len - (self.max_capacity_prompt) +1 many tokens, what is this for? After this, decoding starts, but seems using the full KV cache.