feat: Add MLflow Prompt Registry provider
MLflow Prompt Registry Provider
Summary
This PR adds a new remote MLflow provider for the Prompts API, enabling centralized prompt management and versioning using MLflow's Prompt Registry (MLflow 3.4+).
What's New
Remote Provider: remote::mlflow
A production-ready provider that integrates Llama Stack's Prompts API with MLflow's centralized prompt registry, supporting:
- Version Control: Immutable prompt versioning with full history
- Default Version Management: Easy version switching via aliases
- Auto Variable Extraction: Automatic detection of
{{ variable }}placeholders - Centralized Storage: Team collaboration via shared MLflow server
- Metadata Preservation: Llama Stack metadata stored as MLflow tags
Quick Start
1. Configure Llama Stack
Basic configuration with SQLite (default):
prompts:
- provider_id: reference-prompts
provider_type: inline::reference
config:
run_config:
storage:
stores:
prompts:
type: sqlite
db_path: ./prompts.db
With PostgreSQL:
prompts:
- provider_id: postgres-prompts
provider_type: inline::reference
config:
run_config:
storage:
stores:
prompts:
type: postgres
url: postgresql://user:pass@localhost/llama_stack
2. Use the Prompts API
from llama_stack_client import LlamaStackClient
client = LlamaStackClient(base_url="http://localhost:5000")
# Create a prompt
prompt = client.prompts.create(
prompt="Summarize the following text in {{ num_sentences }} sentences:\n\n{{ text }}",
variables=["num_sentences", "text"]
)
print(f"Created prompt: {prompt.prompt_id} (v{prompt.version})")
# Retrieve prompt
retrieved = client.prompts.get(prompt_id=prompt.prompt_id)
print(f"Retrieved: {retrieved.prompt}")
# Update prompt (creates version 2)
updated = client.prompts.update(
prompt_id=prompt.prompt_id,
prompt="Summarize in exactly {{ num_sentences }} sentences:\n\n{{ text }}",
version=1,
set_as_default=True
)
print(f"Updated to version: {updated.version}")
# List all prompts
prompts = client.prompts.list()
print(f"Found {len(prompts.data)} prompts")
# Delete prompt
client.prompts.delete(prompt_id=prompt.prompt_id)
PR is now ready for review and includes the following updates:
- Moved the previous prompts.py as an inline provider (see inline_reference.mdx for details)
- Defined a remote provider with MLflow supporting authentication (see remote_mlflow.mdx for details)
- Removed any dependencies on prompt caching
@mattf
why do we need to maintain a mapping from prompt id to mlflow prompt name?
The idea is to distinguish Llama Stack-managed prompts from other prompts that might exist in the same MLflow registry.
why do we need to maintain a mapping from prompt id to mlflow prompt name?
The idea is to distinguish Llama Stack-managed prompts from other prompts that might exist in the same MLflow registry.
when would a deployer want to set use_metadata_tags=False? can we always use metadata and skip the id/name translations?
why do we need to maintain a mapping from prompt id to mlflow prompt name?
The idea is to distinguish Llama Stack-managed prompts from other prompts that might exist in the same MLflow registry.
when would a deployer want to set use_metadata_tags=False? can we always use metadata and skip the id/name translations?
We can remove that option because, in practice, setting it to false has significant downsides. Let me remove the option.
why do we need to maintain a mapping from prompt id to mlflow prompt name?
The idea is to distinguish Llama Stack-managed prompts from other prompts that might exist in the same MLflow registry.
when would a deployer want to set use_metadata_tags=False? can we always use metadata and skip the id/name translations?
We can remove that option because, in practice, setting it to false has significant downsides. Let me remove the option.
thanks. do we still need to have the id mapping?
@franciscojavierarceo @mattf I have addressed all the comments. Is there anything else you consider should be addressed for approval?
This pull request has merge conflicts that must be resolved before it can be merged. @williamcaban please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork