worker-vllm Guided decoding backends not working

Neither the old guided decoding API (guided_json) nor the new structured outputs API (structured_outputs) work in this worker. JSON schema constraints are ignored with the old API, and the new API throws parameter errors.

Issues:

Old API (guided_json): Schema constraints are completely ignored - model follows contradictory system prompts instead of enforced JSON
New API (structured_outputs): Throws "Unexpected keyword argument 'extra_body'" error
Both backends affected: outlines and lm-format-enforcer both fail to enforce constraints

Steps to reproduce:

Test 1 - Old API failure:

{
  "input": {
    "messages": [
      {
        "role": "system", 
        "content": "OUTPUT YAML ONLY: reasoning: |\n  your thoughts\nquestion: your question"
      },
      {
        "role": "user",
        "content": "Analyze: The Eiffel Tower was completed in 1889 for the World's Fair."
      }
    ],
    "sampling_params": {
      "max_tokens": 500,
      "temperature": 0.1
    },
    "guided_json": {
      "type": "object",
      "properties": {
        "reasoning": {"type": "string"},
        "question": {"type": "string"}
      },
      "required": ["reasoning", "question"]
    },
    "guided_decoding_backend": "outlines"
  }
}

Result: Model outputs YAML, completely ignoring JSON schema constraints.

Test 2 - New API failure:

{
  "input": {
    "messages": [
      {"role": "user", "content": "Analyze: Some text"}
    ],
    "sampling_params": {
      "max_tokens": 500,
      "temperature": 0.1,
      "extra_body": {
        "structured_outputs": {
          "json": {
            "type": "object",
            "properties": {
              "reasoning": {"type": "string"},
              "question": {"type": "string"}
            },
            "required": ["reasoning", "question"]
          }
        }
      }
    }
  }
}

Result: "Unexpected keyword argument 'extra_body'" error.

Environment:

Worker: vllm v2.9.6 (vLLM 0.11.0)
GUIDED_DECODING_BACKEND environment variable set to outlines

Possible cause: The worker was updated to vLLM 0.11.0 but may not have been updated to handle the API changes. vLLM 0.11.0 introduced the new structured_outputs API while deprecating the old guided decoding fields, but the worker's custom wrapper doesn't support the new API and the old API may no longer function properly.

Both guided decoding approaches are currently unusable in this worker version.

Nov 12 '25 21:11 endolith

~~Maybe this is actually the reason for https://github.com/runpod-workers/worker-vllm/issues/233~~

Nov 12 '25 21:11 endolith

I deployed worker-vllm v2.9.4, which uses the vLLM library v0.10.0. My goal was to test the functionality of the old guided_json API before the introduction of the structured_outputs API in vLLM 0.11.0.

Results:

structured_outputs API (Test 1): As expected for vLLM 0.10.0, this new API failed with the error "Unexpected keyword argument 'extra_body'". This confirms the API is not available in this version of the underlying library.
guided_json API (Tests 2 & 3): The API call succeeded, but the core functionality was still broken. The model output was YAML (as instructed by the system prompt) and did not conform to the JSON schema specified in the guided_json parameter. The schema constraints were ignored.

Conclusion:

Downgrading to worker-vllm v2.9.4 (vLLM 0.10.0) did not fix the issue with guided decoding. The guided_json API, which should have been the correct and supported method in this version, failed to enforce the specified JSON schema. The output format was determined by the conflicting system prompt rather than the guided_json constraint. Therefore, the core problem existed in this older worker version as well.

Test 1:

{
  "input": {
    "messages": [
      {
        "role": "system",
        "content": "OUTPUT YAML ONLY: reasoning: |\n  your thoughts\nquestion: your question"
      },
      {
        "role": "user",
        "content": "Analyze: The Eiffel Tower was completed in 1889 for the World's Fair."
      }
    ],
    "sampling_params": {
      "max_tokens": 500,
      "temperature": 0.1,
      "extra_body": {
        "structured_outputs": {
          "json": {
            "type": "object",
            "properties": {
              "reasoning": {"type": "string"},
              "question": {"type": "string"}
            },
            "required": ["reasoning", "question"]
          }
        }
      }
    }
  }
}

Test 2:

{
  "input": {
    "messages": [
      {
        "role": "system", 
        "content": "OUTPUT YAML ONLY: reasoning: |\n  your thoughts\nquestion: your question"
      },
      {
        "role": "user",
        "content": "Analyze: The Eiffel Tower was completed in 1889 for the World's Fair."
      }
    ],
    "sampling_params": {
      "max_tokens": 500,
      "temperature": 0.1
    },
    "guided_json": {
      "type": "object", 
      "properties": {
        "reasoning": {"type": "string"},
        "question": {"type": "string"}
      },
      "required": ["reasoning", "question"]
    },
    "guided_decoding_backend": "outlines"
  }
}

Test 3:

{
  "input": {
    "messages": [
      {
        "role": "system",
        "content": "OUTPUT YAML ONLY: reasoning: |\n  your thoughts\nquestion: your question"  
      },
      {
        "role": "user",
        "content": "Analyze: The Eiffel Tower was completed in 1889 for the World's Fair."
      }
    ],
    "sampling_params": {
      "max_tokens": 500,
      "temperature": 0.1
    },
    "guided_json": {
      "type": "object",
      "properties": {
        "reasoning": {"type": "string"},
        "question": {"type": "string"}
      },
      "required": ["reasoning", "question"]
    },
    "guided_decoding_backend": "lm-format-enforcer"
  }
}

Nov 12 '25 22:11 endolith

Maybe this is actually the reason for #233

No, I got Structured Outputs working now, but it still doesn't enforce order, whether using

GUIDED_DECODING_BACKEND=outlines

or

GUIDED_DECODING_BACKEND=lm-format-enforcer
LMFE_STRICT_JSON_FIELD_ORDER=true

Nov 14 '25 16:11 endolith

Guided decoding backends not working - both old and new APIs fail