`max_tokens` does not behave as expected when used with regex.

Open zvxayr opened this issue 1 year ago • 0 comments

The bug I expect max_tokens to stop generation whenever the token count is reached and without setting the regex argument, it works perfectly. However it appears to only affect the * and + quantifiers. I'm unsure if this is a bug, or the intended behavior, just undocumented. however, I still expect max_tokens to ignore completing the regex when the generation reaches the limit.

To Reproduce The regex is used is supposed to mean 1 to 6 lines and the line can be empty. Setting max_tokens to 10 makes gen create 1 to 6 lines, each with up to 10 tokens. My expected behavior was to complete the word after strange - machine + newline (2 tokens) then just have 8 tokens left for the second line.

lm = models.LlamaCppChat(
    r"C:\Users\user\models\zephyr-7b-beta.Q3_K_S.gguf",
    n_gpu_layers=-1)

# I used some lyrics for example since they tend to be broken down into new lines more
lm + "Lyrics:\n\nYo! Danny Fenton he was just fourteen\nWhen his parent built a very strange" + gen(regex='([^\n]*\n){1,6}', max_tokens=5)

The LM output from gen was this:

machine
It was a science experiment, they said
But it turned Danny into a living, breathing head

He could think faster than the speed of light
He could calculate a solution in the blink of an

The regex was satisfied (6 lines) but the tokens generated was 52, far more than 10.

System info (please complete the following information):

OS (e.g. Ubuntu, Windows 11, Mac OS, etc.): Windows 11
Guidance Version (guidance.__version__): 0.1.13

Mar 31 '24 08:03 zvxayr