Ben Browning
Ben Browning
I believe the linked code generation already handles (or at least attempts to handle) the fact that OpenAI-compatible API delete methods need to return objects. See a few lines below...
@leseb I'd love to get regular test signals from vLLM (or other providers, but especially vLLM). I'd prefer to run the tests with a "real" model, which requires GPU. If...
Note that I'm running BFCL with vLLM locally as I exercise and improve the gpt-oss implementation for different parts of vLLM. While I'm in no place to publicly share out...
A big -1 on adding default NetworkPolicy objects out of the box. In my experience, it's quite hard to come up with an appropriate NetworkPolicy out of the box that...
Perhaps @mikehelmick or someone else can elaborate on why a user shouldn't be able to bypass the queue-proxy. Is it a concern around bad actors? Accidental misuse of Serving? Conformance...
I agree that for scaling purposes, the primary traffic needs to flow through the queue-proxy. And, it does today. A user has to intentionally try to bypass the internal or...
This PR has conflicts and is a few months out of date. But, as a more general question, we may want to consider guidance to inference provider contributors on when...
Marking as closed, given the comments above. Some of the distribution images will be large due to needed pytorch, while others will be smaller that use only remote providers. And,...
To fully implement the Responses API, you have to handle built-in tools (including executing said tools), persistent conversation state, vector and file storage, and probably a few other things outside...
@OriNachum Llama Stack needs a bit of work on the messaging, as it works not just with Llama models but any model you can run in vLLM, ollama, SaaS providers,...