`max_tokens` does not behave as expected when used with regex.
The bug
I expect max_tokens to stop generation whenever the token count is reached and without setting the regex argument, it works perfectly. However it appears to only affect the * and + quantifiers. I'm unsure if this is a bug, or the intended behavior, just undocumented. however, I still expect max_tokens to ignore completing the regex when the generation reaches the limit.
To Reproduce
The regex is used is supposed to mean 1 to 6 lines and the line can be empty. Setting max_tokens to 10 makes gen create 1 to 6 lines, each with up to 10 tokens. My expected behavior was to complete the word after strange - machine + newline (2 tokens) then just have 8 tokens left for the second line.
lm = models.LlamaCppChat(
r"C:\Users\user\models\zephyr-7b-beta.Q3_K_S.gguf",
n_gpu_layers=-1)
# I used some lyrics for example since they tend to be broken down into new lines more
lm + "Lyrics:\n\nYo! Danny Fenton he was just fourteen\nWhen his parent built a very strange" + gen(regex='([^\n]*\n){1,6}', max_tokens=5)
The LM output from gen was this:
machine
It was a science experiment, they said
But it turned Danny into a living, breathing head
He could think faster than the speed of light
He could calculate a solution in the blink of an
The regex was satisfied (6 lines) but the tokens generated was 52, far more than 10.
System info (please complete the following information):
- OS (e.g. Ubuntu, Windows 11, Mac OS, etc.): Windows 11
- Guidance Version (
guidance.__version__): 0.1.13