Maxime
Maxime
This helps a lot; it now takes 10 minutes vs a few hours before. That being said, I think we should keep this issue open to track vsxmake improvements, even...
Less than a minute
I have the same issue with 0.8.5 + Qwen3 235B, structured decoding creates nonsense and hits max_tokens. The exact same code with 0.8.4 and DeepSeek V3 0324 works.
Example output: ``` ModelResponse(id='chatcmpl-8a2538df23454cbab477b35f50ccc4eb', created=1745943995, model='litellm_proxy/hosted_vllm/qwen3-235b', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='length', index=0, message=Message(content='{\n "tasks": [\n {\n "gap": "Current symptoms and severity of the acute otitis media (e.g., ear pain, fever, hearing loss,...