worker-vllm icon indicating copy to clipboard operation
worker-vllm copied to clipboard

Does LMFE_STRICT_JSON_FIELD_ORDER not work?

Open endolith opened this issue 3 months ago • 1 comments

I tried to enable LMFE_STRICT_JSON_FIELD_ORDER = true in the env vars in Runpod but it still allows the LLM to output json in whatever order it wants (such as outputting the actual response first and the reasoning behind it second).

Outlines is also supposed to enforce order, but doesn't in practice.

https://github.com/noamgat/lm-format-enforcer?tab=readme-ov-file#configuration-options

endolith avatar Nov 10 '25 03:11 endolith

guided_regex doesn't even seem to work.

Guided Decoding Constraint Violations

API Code

# JSON Schema approach
class QuestionResponse(BaseModel):
    reasoning: str = Field(min_length=1)
    question: str = Field(min_length=1)

json_schema = QuestionResponse.model_json_schema()

completion = client.chat.completions.create(
    model=MODEL_NAME_OPENAI_COMPATIBLE,
    messages=[{"role": "system", "content": SYSTEM_MESSAGE}, {"role": "user", "content": snippet}],
    max_tokens=500,
    temperature=0.7,
    extra_body={
        "guided_json": json_schema,
        "guided_decoding_backend": "outlines"  # or "lm-format-enforcer"
    }
)

# Regex approach
guided_regex_pattern = r'^REASONING: .+\nQUESTION: .+\nEND$'

completion = client.chat.completions.create(
    model=MODEL_NAME_OPENAI_COMPATIBLE,
    messages=[{"role": "system", "content": SYSTEM_MESSAGE}, {"role": "user", "content": snippet}],
    max_tokens=500,
    temperature=0.7,
    extra_body={
        "guided_regex": guided_regex_pattern,
        "guided_decoding_backend": "outlines"  # or "lm-format-enforcer"
    }
)

Environment variables: GUIDED_DECODING_BACKEND=outlines|lm-format-enforcer, LMFE_STRICT_JSON_FIELD_ORDER=true


Failures

1. JSON Schema (guided_json)

Enforced: {"reasoning": "...", "question": "..."} with reasoning before question

Got:

  • question field before reasoning (field order violation)
  • Sometimes question followed by \n\n\n\n... until token limit

Backends: Outlines, lm-format-enforcer (both failed)


2. Regex Pattern (guided_regex)

Enforced: ^REASONING: .+\nQUESTION: .+\nEND$

  • Start with REASONING:
  • Single \n, then QUESTION:
  • Single \n, then END
  • End immediately after END

Got:

Issue Example Expected Got
Missing END REASONING: ...\nQUESTION: ... \nEND (missing)
Wrong whitespace ...? END \nEND END
Multiple newlines REASONING: ...\n\nQUESTION: ... \nQUESTION: \n\nQUESTION:
Leading invalid char REASONING: ... ^REASONING: U+FFFDREASONING:
Truncation ...biograp (cut off mid-word) Complete Incomplete, finish_reason: stop

Backends: Outlines, lm-format-enforcer (both failed)


Common Issues

  1. Field order not enforced (JSON)
  2. Start anchor not enforced (Regex ^)
  3. End anchor not enforced (Regex $)
  4. Whitespace not enforced (Regex newlines/spaces)
  5. Truncation despite finish_reason: stop
  6. Encoding issues (U+FFFD replacement characters)

endolith avatar Nov 11 '25 04:11 endolith