llama-stack
llama-stack copied to clipboard
Road to v1
🚀 Describe the new functionality needed
Overview
The goal for Llama Stack v1 is to enable ISVs and enterprise developers to build AI applications in on-prem and VPC environments. It is not meant to be a comprehensive list of all tasks, but rather a guide to help us stay on track.
Milestone 1: Foundation & Infrastructure
- [ ] Make sure that the release process is fast and robust
- [ ] Enable integration tests for all APIs (post-training is missing)
- [ ] MCP server deployment and Oauth integration
- [ ] Developer-facing UI for chat completions and tracing
- [ ] Embedding, keyword and hybrid search
- [ ] Document the stores implementation
Milestone 2: Production Ready APIs and Containers
Standardize all APIs to OpenAI format where possible
- [ ] Embeddings API
- [ ] File search tool / API
- [ ] API separation for independent containers
- [ ] AWS k8s deployment for Llama Stack
Milestone 3: API Hardening
Finalize API work in preparation for the first app deployment
- [ ] Streaming and file search support in Responses API
- [ ] Deprecate non-OpenAI inference endpoint
- [ ] Adopt Moderations API and deprecate run_shield()
- [ ] Unified tool API for Responses and Agents
- [ ] Playground: Agents (responses) + Inference + VectorIO
- [ ] Prometheus and 23ai provider integrations
Milestone 4: Enterprise readiness features
- [ ] Add /health endpoints for each container within the Stack
- [ ] Support authentication (eg. telemetry logs for user A should not be visible for user B)
- [ ] Allow updating resource attributes in the Auth API / ABAC structure
- [ ] API key management for partners
- [ ] Auditing: all CRUD operations must be logged via Telemetry and be queryable efficiently
- [ ] Kubernetes Operator
- [ ] Standardize provider errors
- [ ] Support for per-distro UI components
- [ ] Phone home in Llama Stack via an opt-in flow to observe usage metrics
- [ ] Process to collect canary datasets from developers via an opt-in flow to provide feedback for research teams
Milestone 5: First On-Prem PoC
💡 Why is this needed? What if we don't build it?
Having a clear plan to get to v1 will help the community prioritize the most important features and improvements.
Other thoughts
No response