llama-stack
llama-stack copied to clipboard
Support guided decoding with vllm and remote::vllm
🚀 The feature, motivation and pitch
(fireworks, together, meta-reference) support guided decoding (specifying a json-schema for example, as a "grammar" for decoding) with inference. vLLM supports this functionality -- enable that in the API
Alternatives
No alternatives, this is a core feature that must be supported by all providers (as far as possible).
Additional context
No response