Using a CFG with a <think>.+</think> section, when there is a special token <think>, breaks the CFG with "ParserTooComplex"

Open lsb opened this issue 2 months ago • 1 comments

Describe the issue as clearly as possible:

Use case: I want to constrain my output with a CFG, and I want some arbitrary thinking to happen beforehand. How I am solving this: pass a CFG with an explicit section in the beginning, and then use my grammar. What I have found: when my LLM has a tokenizer that includes a token, this breaks; when the tokenizer doesn't have that in its vocabulary, everything is fine. Potential workaround: run inference once, extract the thinking section, run inference again with CFG with the thinking section pre-stuffed in the assistant's response.

This is distinct from #1627 .

The attached code breaks on Qwen3-4B-Thinking, but works fine on SmolLM2. Crucially, there is a ParserTooComplex error when the tokenizer vocabulary includes , and there is no error when the vocabulary doesn't.

Steps/code to reproduce the bug:

"""
Minimal reproducible example for outlines CFG bug with <think> special tokens.

This demonstrates that when a model has special tokens for <think> and </think>,
outlines CFG grammar fails to parse them correctly.

Expected behavior: Grammar should constrain output to have <think>...</think> followed by yes|no
Actual behavior: Parser error when trying to match special tokens against literal strings

Model: Qwen/Qwen3-4B-Thinking-2507 (has <think> token ID 151667, </think> token ID 151668)
"""

import transformers
from outlines import Transformers
from outlines.types import CFG


def main():
    print("=== Outlines CFG Bug: Special Tokens in Grammar ===\n")

    print(f"Loading model...")
    pipe = transformers.pipeline(
        "text-generation",
        # "HuggingFaceTB/SmolLM2-1.7B-Instruct",
        "Qwen/Qwen3-4B-Thinking-2507",
    )

    # Show that <think> and </think> are special tokens
    print("\n--- Tokenizer Analysis ---")
    vocab = pipe.tokenizer.get_vocab()
    think_start_id = vocab.get('<think>')
    think_end_id = vocab.get('</think>')

    print(f"<think> token ID: {think_start_id}")
    print(f"</think> token ID: {think_end_id}")

    # Show how they encode
    encoded_start = pipe.tokenizer.encode('<think>', add_special_tokens=False)
    encoded_end = pipe.tokenizer.encode('</think>', add_special_tokens=False)
    print(f"<think> encodes to: {encoded_start} (single token)")
    print(f"</think> encodes to: {encoded_end} (single token)")

    # Create outlines model
    print("\n--- Setting up Outlines ---")
    model = Transformers(pipe.model, pipe.tokenizer)

    # Define a grammar that includes <think> tags
    # This SHOULD work but DOESN'T due to special token handling
    grammar_with_thinking = '''
?start: thinking_section answer
thinking_section: "<think>" /[^<]*/ "</think>" /[\\r\\n\\t ]*/
answer: "yes" | "no"
'''

    print("Grammar:")
    print(grammar_with_thinking)

    cfg_type = CFG(grammar_with_thinking)
    prompt = "Is the sky blue?"

    print(f"\n--- Attempting Generation ---")
    print(f"Prompt: {prompt}")
    print("Expected: <think>reasoning here</think>\\nyes")
    print("\nGenerating...")

    try:
        response = model(prompt, cfg_type, max_new_tokens=10000)
        print(f"\nSuccess! Response: {response}")
    except Exception as e:
        print(f"\n❌ ERROR: {type(e).__name__}: {e}")
        print("\nThis demonstrates the bug: outlines cannot match special tokens")
        print("in the grammar against the tokenizer's single-token representation.")

    # Show that a grammar without <think> tags works fine
    print("\n\n--- Testing Grammar Without Special Tokens ---")
    grammar_without_thinking = '''
?start: answer
answer: "yes" | "no"
'''

    print("Grammar (no special tokens):")
    print(grammar_without_thinking)

    cfg_type_simple = CFG(grammar_without_thinking)

    try:
        response = model(prompt, cfg_type_simple, max_new_tokens=10)
        print(f"\n✓ Success! Response: {response}")
        print("\nThis works because there are no special tokens in the grammar.")
    except Exception as e:
        print(f"\n❌ ERROR: {type(e).__name__}: {e}")

if __name__ == "__main__":
    main()

Expected result:

By uncommenting the SmolLM2 model specification and commenting the Qwen3 model specification, the code runs through with two successes, constrained and unconstrained.

Error message:

.venv/lib/python3.13/site-packages/outlines/backends/llguidance.py:175: UserWarning: Error in LLMatcher: Parser Error: token "�[151667]" doesn't satisfy the grammar; forced bytes: got '<'; applying 'ÿ'
<state>
Tokens: ⟦<think>⟧
1 tokens, 0 bytes; grm_prefix: ""
Flags:
Parser: {
  "compute_time_us": 0,
  "rows": 2,
  "cached_rows": 0,
  "all_items": 4,
  "lexer_cost": 3271,
  "slices_applied": 0,
  "trie_nodes_walked": 0,
  "definitive_bytes": 7,
  "lexer_ops": 0,
  "num_lex_errors": 0,
  "num_lexemes": 0
}
Stop: ParserTooComplex
Error: Parser Error: token "�[151667]" doesn't satisfy the grammar; forced bytes: got '<'; applying 'ÿ'
</state><grammar>

?start: thinking_section answer
thinking_section: "<think>" /[^<]*/ "</think>" /[\r\n\t ]*/
answer: "yes" | "no"

</grammar>

Outlines/Python version information:

Version information

``` % python -c "from outlines import _version; print(_version.version)"; python -c "import sys; print('Python', sys.version)"; uv pip freeze; 1.2.7 Python 3.13.3 (main, Apr 8 2025, 13:54:08) [Clang 17.0.0 (clang-1700.0.13.3)] accelerate==1.10.1 aiofiles==24.1.0 aiohappyeyeballs==2.6.1 aiohttp==3.13.0 aiosignal==1.4.0 annotated-types==0.7.0 anyio==4.11.0 attrs==25.4.0 audioop-lts==0.2.2 brotli==1.1.0 certifi==2025.10.5 charset-normalizer==3.4.4 click==8.3.0 cloudpickle==3.1.1 datasets==4.2.0 dill==0.4.0 diskcache==5.6.3 fastapi==0.119.0 ffmpy==0.6.3 filelock==3.20.0 frozenlist==1.8.0 fsspec==2025.9.0 genson==1.3.0 gradio==5.49.1 gradio-client==1.13.3 groovy==0.1.2 h11==0.16.0 hf-xet==1.1.10 httpcore==1.0.9 httpx==0.28.1 huggingface-hub==0.35.3 idna==3.11 iniconfig==2.1.0 jinja2==3.1.6 joblib==1.5.2 jsonpath-ng==1.7.0 jsonschema==4.25.1 jsonschema-specifications==2025.9.1 llguidance==1.2.0 markdown-it-py==4.0.0 markupsafe==3.0.3 mdurl==0.1.2 mpmath==1.3.0 multidict==6.7.0 multiprocess==0.70.16 networkx==3.5 ninja==1.13.0 numpy==2.3.4 optimum-quanto==0.2.7 orjson==3.11.3 outlines==1.2.7 outlines-core==0.2.11 packaging==25.0 pandas==2.3.3 pillow==11.3.0 pluggy==1.6.0 ply==3.11 propcache==0.4.1 psutil==7.1.0 pyarrow==21.0.0 pydantic==2.11.10 pydantic-core==2.33.2 pydub==0.25.1 pygments==2.19.2 pytest==8.4.2 python-dateutil==2.9.0.post0 python-multipart==0.0.20 pytz==2025.2 pyyaml==6.0.3 referencing==0.37.0 regex==2025.9.18 requests==2.32.5 rich==14.2.0 rpds-py==0.27.1 ruff==0.14.0 safehttpx==0.1.6 safetensors==0.6.2 scikit-learn==1.7.2 scipy==1.16.2 semantic-version==2.10.0 sentence-transformers==5.1.1 sentencepiece==0.2.1 setuptools==80.9.0 shellingham==1.5.4 six==1.17.0 sniffio==1.3.1 starlette==0.48.0 sympy==1.14.0 threadpoolctl==3.6.0 tokenizers==0.22.1 tomlkit==0.13.3 torch==2.9.0 tqdm==4.67.1 transformers==4.57.1 typer==0.19.2 typing-extensions==4.15.0 typing-inspection==0.4.2 tzdata==2025.2 urllib3==2.5.0 uvicorn==0.37.0 websockets==15.0.1 xxhash==3.6.0 yarl==1.22.0 ```

Context for the issue:

No response

Oct 15 '25 23:10 lsb

Thanks for the detailed issue! We're working on adding explicit support for reasoning models, this example will surely be very useful to understand the problem and find a way of avoiding it.

Oct 17 '25 12:10 RobinPicard