llama.cpp
llama.cpp copied to clipboard
[Feature Suggestion] Load/Save current conversation's tokens into file
Now that we have infinite transcription mode. Would it be possible to dump tokens into file and load them back next time you run llama.cpp to resume conversation?
Although it will be tricky to implement efficiently with long conversations, for example by
- storing prompt itself as tokens
- store in-between messages as raw text
- store last messages within ctx_size as tokens
Yes, this feature seems especially important to have to avoid having to go through the inference process each time for the initial prompt allowing faster startup (as discussed here: https://github.com/ggerganov/llama.cpp/issues/484#issuecomment-1483975437).
This is kinda related and would fit well together
- https://github.com/ggerganov/llama.cpp/pull/477
@linouxis9 you are talking about a different thing though, saving the state and not just the tokens. separation of state and model is part of the current roadmap
Saving/loading state needs a big file but loads fast, while saving/loading tokens needs a tiny file but would also still need to do inference like usual.
Someone please correct me if my analogy is bad, but I try to explain the difference using a real world analogy:
Mission: You enter your car outside your home. You need to get to work.
Option A (Load tokens): You start with a blank memory. You get the instructions on how to drive to work, ⬆⬆⬇⬇⬅➡⬅➡🅱🅰 , you drive there using the instructions, this takes some time. You see a friendly alpaca on your way there. You remember how you got there and that you saw a friendly alpaca on the way.
Option B (Load state): You are implanted of a memory of you driving to work and seeing a friendly alpaca, then are instantly teleported to work. You remember how you got there and that you saw a friendly alpaca on the way.
These will both work.
Option C (This can't work): You start with a blank memory. You are teleported to work with the instructions on how to drive to work. You don't remember how you got there, nor that you saw a friendly alpaca on the way. Even if you worked the instructions backwards somehow, you still couldn't possibly know about the friendly alpaca.
So basically Option A could also be implemented by just passing back the previous conversation as the initial prompt? While what I'm more interested in is Option B where we do not have to go through all the way to work again ;-) I understand, thank you for your great analogy @anzz1!!
This issue was closed because it has been inactive for 14 days since being marked as stale.