jq-wei
Results
1
comments of
jq-wei
Especially, after prefilling, there is one attention loop for seq_len - (self.max_capacity_prompt) +1 many tokens, what is this for? After this, decoding starts, but seems using the full KV cache.