guidance stop_regex not working as expected

The bug Hi, i observed bad output quality and artifacts in a JSON generator that i created using guidance. I narrowed the problem down and it seems generation doesn't end when I would expect it to using the stop_regex argument. (I hope this is not just a misunderstanding on my part, but I do not think so)

To Reproduce I narrowed it down to the following minimal example.

For debugging I added the following snippet in line 283 in guidance/models/_engine/_engine.py, to print the top tokens of the model

top_indices = np.argsort(logits[-1, :])[-5:][::-1]  # Get top 5 indices
top_logits = logits[-1, :][top_indices]
top_tokens = [self.tokenizer.decode([idx]) for idx in top_indices]
print("Top 5 tokens:", list(zip(top_tokens, top_logits.tolist())))

Then execute the following example

from guidance import system, user, assistant, gen
from guidance.models import Transformers

lm = Transformers("microsoft/Phi-4-mini-instruct")

with system():
    lm += "be helpful"

with user():
    lm += """Please just copy and reprint exactly the following json object, do not add anything else:
{
    word1: "test",
    word2: "test",
}
"""

with assistant():
    lm += '''
{
    word1: "'''.strip() # here i want the model to continue, and i expect it to complete this line with: test",

    lm += gen(stop_regex='"', max_tokens=10)

print(str(lm))

Now since I set the stop_regex to stop at ", I expect the model to stop after ...test". However, it doesn't. Here's the output of the above program:

Top 5 tokens: [(b' "', 54.646568298339844), (b' "",\n', 39.89239501953125), (b' "\n', 39.43955612182617), (b' \xe2\x80\x9c', 39.37717056274414), (b' \\"', 38.85851287841797)]
Top 5 tokens: [(b'test', 52.59314727783203), (b'word', 39.43867111206055), (b'text', 38.878150939941406), (b't', 37.80302429199219), (b'testing', 36.417659759521484)]
Top 5 tokens: [(b'",\n', 59.446231842041016), (b'",', 46.492584228515625), (b',\n', 45.68318557739258), (b'"\n', 42.614723205566406), (b',', 42.359291076660156)]
Top 5 tokens: [(b'   ', 41.6693000793457), (b'}\n', 35.06786346435547), (b'}', 34.63892364501953), (b'}\n\n', 33.73918533325195), (b' ', 33.1707763671875)]
Top 5 tokens: [(b' word', 44.99620819091797), (b' "', 35.925697326660156), (b' w', 33.24827194213867), (b' wo', 32.69148254394531), (b' world', 32.17226028442383)]
Top 5 tokens: [(b'2', 47.63304138183594), (b'1', 39.263710021972656), (b'3', 37.8875732421875), (b'4', 34.72578811645508), (b':', 34.53622055053711)]
Top 5 tokens: [(b':', 48.771244049072266), (b'":', 39.95098876953125), (b':"', 37.4881477355957), (b':\n', 36.4210319519043), (b':",', 33.71699523925781)]
Top 5 tokens: [(b' "', 47.51124572753906), (b' test', 39.59502410888672), (b' "\n', 37.369503021240234), (b' ', 36.77366256713867), (b' ""', 36.00505828857422)]
<|system|>be helpful<|end|><|user|>Please just copy and reprint exactly the following json object, do not add anything else:
{
    word1: "test",
    word2: "test",
}
<|end|><|assistant|>{
    word1: "test,
    word2:

As you can see in the debug outputs, it correctly predicts ",\n as the top token after word1: "test, but it doesn't use it for some reason and also doesn't stop the generation even though it contains ", and only stops in the next line when the top predicted token is ".

It also happens when using greedy sampling with top_k=1, and it also happens when I use e.g. '.*".*' or ["] as regex. It only does NOT happen if I exactly match the predicted token, i.e. if I specify stop_regex='",\n', but that's not actually how the feature is intended to work, is it?

Edit: Interestingly, the problem also occurs when I specify stop_regex=['"', '",\n'], even though it doesn't happen when I specify stop_regex='",\n'

System info:

Ubuntu 24.04.2
guidance: 0.2.4

Jul 18 '25 11:07 luk400

Hi @luk400, thanks for the issue!

I believe that what you're seeing here is a consequence of the fact that the token ",\n is actually disallowed because the grammar is required to stop after producing a ", but the token continues on with ,\n.

The issue you are seeing disappears if you follow it up with the literal text that you expect to follow it, e.g. like so:

    lm += '''
{
    word1: "'''.strip() # here i want the model to continue, and i expect it to complete this line with: test",

    lm += gen(stop_regex='"', max_tokens=10) + '",\n"'

Sadly, grammars are somewhat non-associative in this way:

lm += foo
lm += bar

is sometimes meaningfully different from

lm += (foo + bar)

All of this being said; I advise you to use guidance's built-in json functionality, as it's a rather feature complete implementation that directly translates JSON Schemas to efficient guidance grammars.

In your case,

from guidance import system, user, assistant, json
from guidance.models import Transformers

lm = Transformers("microsoft/Phi-4-mini-instruct")

with system():
    lm += "be helpful"

with user():
    lm += """Please just copy and reprint exactly the following json object, do not add anything else:
{
    word1: "test",
    word2: "test",
}
"""

with assistant():
    lm += json(
      schema={
         "type": "object",
         "properties": {
            "word1": {"type": "string", "maxLength": 30},
            "word2": {"type": "string", "maxLength": 30}
         },
         "additionalProperties": False
      }
   )

print(str(lm))

Jul 18 '25 20:07 hudson-ai

Interesting, thanks for the information!

I thought it was a bug since I didn't notice these issues before upgrading from guidance version 0.1.x

The reason I didn't use the inbuilt JSON feature so far is because I liked the finegranular control over the json generation process (giving me the ability to check generated values while they're being generated, in some cases exit early etc. based on specific conditions, ...), which I found useful especially when generating some rather large/complex JSON objects. (also, though this is a minor reason, this paper claims their "whitespace-flexible" implementation using guidance outperforms more standard, strict implementations, see appendix A: https://arxiv.org/pdf/2403.06988 - though this might already be considered in the builtin JSON feature of guidance these days, I don't know)

Anyways, I'll just adapt my program using the builtin JSON generation and see how well that compares to my prior solution, feel free to close this issue :)

Just as a sidenote: it might be worthwhile to document behaviour like this somewhere in a troubleshooting section or something similar, so users who encounter similar issues don't get confused by unexpected outputs.

Jul 19 '25 09:07 luk400