ktransformers
ktransformers copied to clipboard
[Feature] Insert generated response in kv cache
Checklist
- [x] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/kvcache-ai/ktransformers/discussions. Otherwise, it will be closed.
- [x] 2. To help the community, I will use Chinese/English or attach an Chinese/English translation if using another language. Non-English/Chinese content without translation may be closed.
Motivation
I noticed when I am generating responses for a single user chat (alternating user and ai messages) then after the ai has generated a message and I send a new prompt following that, it has to re-process the last ai message as if it was a new input. This is unnecessary, it can be inserted into the cache during generation.
Related resources
No response