llama.cpp feat: '--in-prefix STRING' option

--in-prefix STRING command line option prefixes user inputs with STRING

For example, chatting with bob: ./main -m ./models/llama-13B-ggml/ggml-model-q4_0.bin -n 256 --repeat_penalty 1.0 -f ./prompts/chat-with-bob.txt -i -r "User:" --in-prefix " " adds a space after the reverse prompt "User:"

So instead of

Bob: How can I help you?
User:_

its

Bob: How can I help you?
User: _

and matches the original prompt better.

It could be useful for other prompts too, alignment or maybe testing multiple similar questions like "What do you think about X" or whatever.

Mar 23 '23 12:03 anzz1

personally i don't see much value in this change. for your case specifically you could just add the space to the reverse prompt "Bob:" -> Bob: "

reverse prompt with extra space seems to not work for me atleast, llama.cpp goes on as if there was no reverse prompt in that case

Mar 23 '23 13:03 x02Sylvie

reverse prompt with extra space seems to not work for me atleast, llama.cpp goes on as if there was no reverse prompt in that case

that sounds like a bug

Mar 23 '23 13:03 Green-Sky

reverse prompt with extra space seems to not work for me atleast, llama.cpp goes on as if there was no reverse prompt in that case

that sounds like a bug

it's because the reverse prompt only test for last output. "user:" and " " are two different tokens, so it doesn't work. idk if it should be changed though.

in any case there is different value in this, you would not want to use -r "User:" to -r "User: " even if it worked. because you would insert two tokens as reverse prompts "user:" and " " then. you do not want that, you want to have the space as part of user input. because then if user types "Hey" the tokens can be "user:" and " Hey" or also "user:" and " " and "Hey". but if you forced a space token there it would remove an option. this is important distinction. that is why you want to use a -r "User:" and --in-prefix " " as separate parameters

Mar 23 '23 16:03 anzz1

it's because the reverse prompt only test for last output. "user:" and " " are two different tokens, so it doesn't work. idk if it should be changed though.

actually i think, it is because the space is part of the next token, so there is no tailing space to catch...

Mar 23 '23 16:03 Green-Sky

actually i think, it is because the space is part of the next token, so there is no tailing space to catch...

True, the space after "user:" can be either a token of its' own or part of the next token. The reverse prompt code should be fixed to check more than the last output so that it can match even when the reverse prompt spans multiple tokens.

Also noticed another issue with it: main.cpp#L435 that the antiprompt functions should be wrapped in a antiprompt.empty() check as currently that function runs even if reverse prompt is not used.

Anyway we are getting derailed here since the point I'm trying to make here is that this functionality is not related to reverse prompt, it was just an usage example.

This is simply preinjecting any text to each user input which can be used to build various new interactions. It can be used with or without reverse prompts.

Mar 24 '23 16:03 anzz1

@anzz1 The reverse prompt can span multiple tokens. However, there is no way for it to interrupt generation mid-token. (That's why I opted to use token vectors rather than strings for reverse prompts when I first wrote interactive mode) Therefore, the best thing you could hope for is that if, say, the generation outputs tokens amounting to "User:", " Hello", control is passed to the user after that because "User: " was in the output. You will not, however, be able to get rid of the "Hello". This would only be possible if we had some means to roll back generation (which is bound to be computationally expensive either way).

The PR's idea seems like the best we can do to me if we want to simultaneously (1) always correctly detect when reverse prompts of form "Name:" are emitted, (2) not force the user to have to enter the space after that manually, (3) not have the reverse prompt be followed by one model-imposed word like the "Hello" in the above example and (4) don't want to implement generation rollback.

Mar 24 '23 17:03 blackhole89

The reverse prompt can span multiple tokens. However, there is no way for it to interrupt generation mid-token. (That's why I opted to use token vectors rather than strings for reverse prompts when I first wrote interactive mode) Therefore, the best thing you could hope for is that if, say, the generation outputs tokens amounting to "User:", " Hello", control is passed to the user after that because "User: " was in the output. You will not, however, be able to get rid of the "Hello". This would only be possible if we had some means to roll back generation (which is bound to be computationally expensive either way).

Thanks for putting into words how it works better than I could. Yes, implementing rollback doesn't pass cost-benefit analysis.

However, it might be a good idea to put in the backlog of scanning the text output between last interaction and now (not only between last token and now) after generating a token to scan whether the reverse prompt was found as the computation required is insignificant. So like you said, a -r "User: " would stop generating after "User:", " whatever" instead of going on like it does now. This would need an additional text buffer to be added to the state, which idk if it's worth at least for now to add such a thing. After all, we're trying to keep things lean here, right? :)

The PR's idea seems like the best we can do to me if we want to simultaneously (1) always correctly detect when reverse prompts of form "Name:" are emitted, (2) not force the user to have to enter the space after that manually, (3) not have the reverse prompt be followed by one model-imposed word like the "Hello" in the above example and (4) don't want to implement generation rollback.

My communication on what this aims to achieve was less than stellar. This is exactly what I was going for, I just couldn't put it into words properly. Can add whatever to the output with basically zero cost.

In the future with sliding context window (and infinite generation that can come with it), it could be great for testing things like "Please continue" and mashing enter without having to type it out.

Mar 25 '23 12:03 anzz1

Yes, implementing rollback doesn't pass cost-benefit analysis.

should be "pretty cheap". you just need to track the tokenindex for each char.

Mar 25 '23 14:03 Green-Sky

llama.cpp llama.cpp copied to clipboard

feat: '--in-prefix STRING' option

llama.cpp
llama.cpp copied to clipboard