Ben Browning

Results 55 comments of Ben Browning

# Example of using Llama Stack for Responses API with vLLM We're still actively working on the Responses API in Llama Stack, but here's a minimal example using our latest...

With the unit tests and pre-commits now passing, I pulled this again locally to try it manually with a real vLLM server. I don't think it's working quite as expected...

Thanks for all the iteration on this! Looks good to me, so I'm going to go ahead and merge. I don't see any specific reason the health check timeout needed...

@jaideepr97 Is this PR still relevant? I'd be happy to review and help get these fixes in if you're willing to resolve conflicts and the failing pre-commit / tests.

> > we should encourage users who want separated reasoning details to use /v1/responses w/ reasoning={"summary": ...} > > I think that is also fair, but I also think that...

> those are great principles. (2) especially makes sense for compliant APIs. > > we need to be careful in applying (2). in this case, vllm has a proprietary extension...

I've tried to use our published container images a few times, and agree it would be great for us to build and publish ship arm64 container images.

So, I have a local branch of our `llama-stack-ops` repo that builds and publishes arm64 images. The core bits in this `llama-stack` repo already support that. Things are a bit...

We currently have a CI job in this repo (providers-build.yml) that installs each distribution. I may prototype what it would look like to add supported architectures as an attribute of...

I had a few hours to work on this, so opened https://github.com/meta-llama/llama-stack-ops/pull/33 that will get us building and pushing amd64 images for every distribution, arm64 images for all our distributions...