llama-stack icon indicating copy to clipboard operation
llama-stack copied to clipboard

Road to v1

Open reluctantfuturist opened this issue 4 months ago • 2 comments

🚀 Describe the new functionality needed

Overview

The goal for Llama Stack v1 is to enable ISVs and enterprise developers to build AI applications in on-prem and VPC environments. It is not meant to be a comprehensive list of all tasks, but rather a guide to help us stay on track.

Milestone 1: Foundation & Infrastructure

  • [ ] Make sure that the release process is fast and robust
  • [ ] Enable integration tests for all APIs (post-training is missing)
  • [ ] MCP server deployment and Oauth integration
  • [ ] Developer-facing UI for chat completions and tracing
  • [ ] Embedding, keyword and hybrid search
  • [ ] Document the stores implementation

Milestone 2: Production Ready APIs and Containers

Standardize all APIs to OpenAI format where possible

  • [ ] Embeddings API
  • [ ] File search tool / API
  • [ ] API separation for independent containers
  • [ ] AWS k8s deployment for Llama Stack

Milestone 3: API Hardening

Finalize API work in preparation for the first app deployment

  • [ ] Streaming and file search support in Responses API
  • [ ] Deprecate non-OpenAI inference endpoint
  • [ ] Adopt Moderations API and deprecate run_shield()
  • [ ] Unified tool API for Responses and Agents
  • [ ] Playground: Agents (responses) + Inference + VectorIO
  • [ ] Prometheus and 23ai provider integrations

Milestone 4: Enterprise readiness features

  • [ ] Add /health endpoints for each container within the Stack
  • [ ] Support authentication (eg. telemetry logs for user A should not be visible for user B)
  • [ ] Allow updating resource attributes in the Auth API / ABAC structure
  • [ ] API key management for partners
  • [ ] Auditing: all CRUD operations must be logged via Telemetry and be queryable efficiently
  • [ ] Kubernetes Operator
  • [ ] Standardize provider errors
  • [ ] Support for per-distro UI components
  • [ ] Phone home in Llama Stack via an opt-in flow to observe usage metrics
  • [ ] Process to collect canary datasets from developers via an opt-in flow to provide feedback for research teams

Milestone 5: First On-Prem PoC

💡 Why is this needed? What if we don't build it?

Having a clear plan to get to v1 will help the community prioritize the most important features and improvements.

Other thoughts

No response

reluctantfuturist avatar May 28 '25 01:05 reluctantfuturist