Using a CFG with a <think>.+</think> section, when there is a special token <think>, breaks the CFG with "ParserTooComplex"
Describe the issue as clearly as possible:
Use case: I want to constrain my output with a CFG, and I want some arbitrary thinking to happen beforehand.
How I am solving this: pass a CFG with an explicit
This is distinct from #1627 .
The attached code breaks on Qwen3-4B-Thinking, but works fine on SmolLM2. Crucially, there is a ParserTooComplex error when the tokenizer vocabulary includes
Steps/code to reproduce the bug:
"""
Minimal reproducible example for outlines CFG bug with <think> special tokens.
This demonstrates that when a model has special tokens for <think> and </think>,
outlines CFG grammar fails to parse them correctly.
Expected behavior: Grammar should constrain output to have <think>...</think> followed by yes|no
Actual behavior: Parser error when trying to match special tokens against literal strings
Model: Qwen/Qwen3-4B-Thinking-2507 (has <think> token ID 151667, </think> token ID 151668)
"""
import transformers
from outlines import Transformers
from outlines.types import CFG
def main():
print("=== Outlines CFG Bug: Special Tokens in Grammar ===\n")
print(f"Loading model...")
pipe = transformers.pipeline(
"text-generation",
# "HuggingFaceTB/SmolLM2-1.7B-Instruct",
"Qwen/Qwen3-4B-Thinking-2507",
)
# Show that <think> and </think> are special tokens
print("\n--- Tokenizer Analysis ---")
vocab = pipe.tokenizer.get_vocab()
think_start_id = vocab.get('<think>')
think_end_id = vocab.get('</think>')
print(f"<think> token ID: {think_start_id}")
print(f"</think> token ID: {think_end_id}")
# Show how they encode
encoded_start = pipe.tokenizer.encode('<think>', add_special_tokens=False)
encoded_end = pipe.tokenizer.encode('</think>', add_special_tokens=False)
print(f"<think> encodes to: {encoded_start} (single token)")
print(f"</think> encodes to: {encoded_end} (single token)")
# Create outlines model
print("\n--- Setting up Outlines ---")
model = Transformers(pipe.model, pipe.tokenizer)
# Define a grammar that includes <think> tags
# This SHOULD work but DOESN'T due to special token handling
grammar_with_thinking = '''
?start: thinking_section answer
thinking_section: "<think>" /[^<]*/ "</think>" /[\\r\\n\\t ]*/
answer: "yes" | "no"
'''
print("Grammar:")
print(grammar_with_thinking)
cfg_type = CFG(grammar_with_thinking)
prompt = "Is the sky blue?"
print(f"\n--- Attempting Generation ---")
print(f"Prompt: {prompt}")
print("Expected: <think>reasoning here</think>\\nyes")
print("\nGenerating...")
try:
response = model(prompt, cfg_type, max_new_tokens=10000)
print(f"\nSuccess! Response: {response}")
except Exception as e:
print(f"\n❌ ERROR: {type(e).__name__}: {e}")
print("\nThis demonstrates the bug: outlines cannot match special tokens")
print("in the grammar against the tokenizer's single-token representation.")
# Show that a grammar without <think> tags works fine
print("\n\n--- Testing Grammar Without Special Tokens ---")
grammar_without_thinking = '''
?start: answer
answer: "yes" | "no"
'''
print("Grammar (no special tokens):")
print(grammar_without_thinking)
cfg_type_simple = CFG(grammar_without_thinking)
try:
response = model(prompt, cfg_type_simple, max_new_tokens=10)
print(f"\n✓ Success! Response: {response}")
print("\nThis works because there are no special tokens in the grammar.")
except Exception as e:
print(f"\n❌ ERROR: {type(e).__name__}: {e}")
if __name__ == "__main__":
main()
Expected result:
By uncommenting the SmolLM2 model specification and commenting the Qwen3 model specification, the code runs through with two successes, constrained and unconstrained.
Error message:
.venv/lib/python3.13/site-packages/outlines/backends/llguidance.py:175: UserWarning: Error in LLMatcher: Parser Error: token "�[151667]" doesn't satisfy the grammar; forced bytes: got '<'; applying 'ÿ'
<state>
Tokens: ⟦<think>⟧
1 tokens, 0 bytes; grm_prefix: ""
Flags:
Parser: {
"compute_time_us": 0,
"rows": 2,
"cached_rows": 0,
"all_items": 4,
"lexer_cost": 3271,
"slices_applied": 0,
"trie_nodes_walked": 0,
"definitive_bytes": 7,
"lexer_ops": 0,
"num_lex_errors": 0,
"num_lexemes": 0
}
Stop: ParserTooComplex
Error: Parser Error: token "�[151667]" doesn't satisfy the grammar; forced bytes: got '<'; applying 'ÿ'
</state><grammar>
?start: thinking_section answer
thinking_section: "<think>" /[^<]*/ "</think>" /[\r\n\t ]*/
answer: "yes" | "no"
</grammar>
Outlines/Python version information:
Version information
Context for the issue:
No response
Thanks for the detailed issue! We're working on adding explicit support for reasoning models, this example will surely be very useful to understand the problem and find a way of avoiding it.