llama-stack
llama-stack copied to clipboard
Support "stop" parameter in inference providers
🚀 Describe the new functionality needed
"stop" parameter: https://platform.openai.com/docs/api-reference/completions/create#completions-create-stop
💡 Why is this needed? What if we don't build it?
Only vLLM inference provider is supported/tested through https://github.com/meta-llama/llama-stack/pull/1715
Other thoughts
No response