Ben Browning comments

Results 55 comments of


                                            Ben Browning

feat: Add webmethod for deleting openai responses

I believe the linked code generation already handles (or at least attempts to handle) the fact that OpenAI-compatible API delete methods need to return objects. See a few lines below...

Add a remote-vllm integration test to GitHub Actions workflow.

@leseb I'd love to get regular test signals from vLLM (or other providers, but especially vLLM). I'd prefer to run the tests with a "real" model, which requires GPU. If...

[BFCL] Is there support for running BFCL evaluation with GPT-OSS?

Note that I'm running BFCL with vLLM locally as I exercise and improve the gpt-oss implementation for different parts of vLLM. While I'm in no place to publicly share out...

It is possible to bypass the queue-proxy

A big -1 on adding default NetworkPolicy objects out of the box. In my experience, it's quite hard to come up with an appropriate NetworkPolicy out of the box that...

It is possible to bypass the queue-proxy

Perhaps @mikehelmick or someone else can elaborate on why a user shouldn't be able to bypass the queue-proxy. Is it a concern around bad actors? Accidental misuse of Serving? Conformance...

It is possible to bypass the queue-proxy

I agree that for scaling purposes, the primary traffic needs to flow through the queue-proxy. And, it does today. A user has to intentionally try to bypass the internal or...

DeepInfra basic inference integration

This PR has conflicts and is a few months out of date. But, as a more general question, we may want to consider guidance to inference provider contributors on when...

docker images are too large

Marking as closed, given the comments above. Some of the distribution images will be large due to needed pytorch, while others will be smaller that use only remote providers. And,...

[Feature]: Support openai responses API interface

To fully implement the Responses API, you have to handle built-in tools (including executing said tools), persistent conversation state, vector and file storage, and probably a few other things outside...

[Feature]: Support openai responses API interface

@OriNachum Llama Stack needs a bit of work on the messaging, as it works not just with Llama models but any model you can run in vLLM, ollama, SaaS providers,...