llama.cpp
llama.cpp copied to clipboard
feat: '--in-prefix STRING' option
--in-prefix STRING
command line option prefixes user inputs with STRING
For example, chatting with bob:
./main -m ./models/llama-13B-ggml/ggml-model-q4_0.bin -n 256 --repeat_penalty 1.0 -f ./prompts/chat-with-bob.txt -i -r "User:" --in-prefix " "
adds a space after the reverse prompt "User:"
So instead of
Bob: How can I help you?
User:_
its
Bob: How can I help you?
User: _
and matches the original prompt better.
It could be useful for other prompts too, alignment or maybe testing multiple similar questions like "What do you think about X" or whatever.
personally i don't see much value in this change. for your case specifically you could just add the space to the reverse prompt "Bob:" -> Bob: "
reverse prompt with extra space seems to not work for me atleast, llama.cpp goes on as if there was no reverse prompt in that case
reverse prompt with extra space seems to not work for me atleast, llama.cpp goes on as if there was no reverse prompt in that case
that sounds like a bug
reverse prompt with extra space seems to not work for me atleast, llama.cpp goes on as if there was no reverse prompt in that case
that sounds like a bug
it's because the reverse prompt only test for last output. "user:" and " " are two different tokens, so it doesn't work. idk if it should be changed though.
in any case there is different value in this, you would not want to use -r "User:"
to -r "User: "
even if it worked. because you would insert two tokens as reverse prompts "user:" and " " then. you do not want that, you want to have the space as part of user input. because then if user types "Hey" the tokens can be "user:" and " Hey" or also "user:" and " " and "Hey". but if you forced a space token there it would remove an option. this is important distinction. that is why you want to use a -r "User:"
and --in-prefix " "
as separate parameters
it's because the reverse prompt only test for last output. "user:" and " " are two different tokens, so it doesn't work. idk if it should be changed though.
actually i think, it is because the space is part of the next token, so there is no tailing space to catch...
actually i think, it is because the space is part of the next token, so there is no tailing space to catch...
True, the space after "user:" can be either a token of its' own or part of the next token. The reverse prompt code should be fixed to check more than the last output so that it can match even when the reverse prompt spans multiple tokens.
Also noticed another issue with it: main.cpp#L435 that the antiprompt functions should be wrapped in a antiprompt.empty() check as currently that function runs even if reverse prompt is not used.
Anyway we are getting derailed here since the point I'm trying to make here is that this functionality is not related to reverse prompt, it was just an usage example.
This is simply preinjecting any text to each user input which can be used to build various new interactions. It can be used with or without reverse prompts.
@anzz1 The reverse prompt can span multiple tokens. However, there is no way for it to interrupt generation mid-token. (That's why I opted to use token vectors rather than strings for reverse prompts when I first wrote interactive mode) Therefore, the best thing you could hope for is that if, say, the generation outputs tokens amounting to "User:", " Hello"
, control is passed to the user after that because "User: "
was in the output. You will not, however, be able to get rid of the "Hello". This would only be possible if we had some means to roll back generation (which is bound to be computationally expensive either way).
The PR's idea seems like the best we can do to me if we want to simultaneously (1) always correctly detect when reverse prompts of form "Name:" are emitted, (2) not force the user to have to enter the space after that manually, (3) not have the reverse prompt be followed by one model-imposed word like the "Hello" in the above example and (4) don't want to implement generation rollback.
The reverse prompt can span multiple tokens. However, there is no way for it to interrupt generation mid-token. (That's why I opted to use token vectors rather than strings for reverse prompts when I first wrote interactive mode) Therefore, the best thing you could hope for is that if, say, the generation outputs tokens amounting to
"User:", " Hello"
, control is passed to the user after that because"User: "
was in the output. You will not, however, be able to get rid of the "Hello". This would only be possible if we had some means to roll back generation (which is bound to be computationally expensive either way).
Thanks for putting into words how it works better than I could. Yes, implementing rollback doesn't pass cost-benefit analysis.
However, it might be a good idea to put in the backlog of scanning the text output between last interaction and now (not only between last token and now) after generating a token to scan whether the reverse prompt was found as the computation required is insignificant. So like you said, a -r "User: "
would stop generating after "User:", " whatever"
instead of going on like it does now. This would need an additional text buffer to be added to the state, which idk if it's worth at least for now to add such a thing. After all, we're trying to keep things lean here, right? :)
The PR's idea seems like the best we can do to me if we want to simultaneously (1) always correctly detect when reverse prompts of form "Name:" are emitted, (2) not force the user to have to enter the space after that manually, (3) not have the reverse prompt be followed by one model-imposed word like the "Hello" in the above example and (4) don't want to implement generation rollback.
My communication on what this aims to achieve was less than stellar. This is exactly what I was going for, I just couldn't put it into words properly. Can add whatever to the output with basically zero cost.
In the future with sliding context window (and infinite generation that can come with it), it could be great for testing things like "Please continue" and mashing enter without having to type it out.
Yes, implementing rollback doesn't pass cost-benefit analysis.
should be "pretty cheap". you just need to track the tokenindex for each char.