feat: support Docker Model Runner
Summary
Add Docker Model Runner as a local provider option alongside Ollama in Rig, following the same implementation pattern.
Motivation
Docker Desktop now includes Model Runner (Beta in 4.40+) which provides:
- Built-in llama.cpp inference engine
- OpenAI-compatible API at
http://localhost:12434/v1 - Same interface as Ollama but with Docker's ecosystem benefits
- Native host execution on Apple Silicon for optimal performance
Reference: [Docker Blog: Run LLMs Locally](https://www.docker.com/blog/run-llms-locally/)
Implementation
Following Ollama's pattern, create a minimal Docker provider:
// rig-provider-docker/src/lib.rs
use rig::providers::docker;
const DOCKER_API_BASE_URL: &str = "http://localhost:12434/v1";
pub struct Client {
base_url: Url,
http_client: reqwest::Client,
}
impl Client {
pub fn new() -> Self {
// Reuse Ollama's structure, just change the base URL
Self {
base_url: Url::parse(DOCKER_API_BASE_URL).expect("Valid URL"),
http_client: reqwest::Client::new(),
}
}
}
// Since Docker uses OpenAI-compatible endpoints, we can mostly
// reuse the OpenAI provider implementation instead of Ollama's
impl CompletionModel for DockerModel {
// Forward to OpenAI-style /v1/completions endpoint
}
Usage
Exactly the same as Ollama:
/// This example requires Docker Desktop with Model Runner enabled.
use rig::prelude::*;
use rig::{completion::Prompt, providers};
#[tokio::main]
async fn main() -> Result<(), anyhow::Error> {
// Create docker client
let client = providers::docker::Client::new();
// Create agent with a single context prompt
let comedian_agent = client
.agent("qwen2.5:14b")
.preamble("You are a comedian here to entertain the user using humour and jokes.")
.build();
// Prompt the agent and print the response
let response = comedian_agent.prompt("Entertain me!").await?;
println!("{response}");
Ok(())
}
Key Differences from Ollama
- Base URL:
http://localhost:12434/v1(Docker) vshttp://localhost:11434(Ollama) - API Style: OpenAI-compatible vs Ollama's custom format
- Model Management:
docker model pullvsollama pull - Model Format: Uses GGUF models packaged as OCI artifacts
Implementation Notes
- Can largely copy Ollama's client structure
- Use OpenAI's request/response format instead of Ollama's
- Environment variable:
DOCKER_MODEL_RUNNER_URL(if needed) - Same
ProviderClient,CompletionClient,EmbeddingsClienttraits
Benefits for Rig Users
- Zero configuration: Works out of the box with Docker Desktop
- Performance: Native execution avoids VM overhead
- Familiar tooling: Uses standard Docker commands
- Future-proof: Docker is expanding Model Runner capabilities
Questions for Maintainers
- Should we detect if Docker Model Runner is available at runtime?
- Preferred module name:
dockerordocker_model_runner? - Should we support Docker's model pulling via the API?
References
- [Docker Blog: Run LLMs Locally](https://www.docker.com/blog/run-llms-locally/)
- [Docker Model Runner Documentation](https://docs.docker.com/desktop/model-runner/)
This seems like it's already compliant with the OpenAI Chat Completions API, is there any specific reason to add this as a separate provider?
I'm all for adding new model providers but we should be wary of adding too many as each new one also adds maintenance overhead
Good question! I don't have an answer for it. The promise of Docker Model Runner is that its implementation is more optimized for multiple cores, offering a promise of better performance than Ollama or LMStudio. ( which personally I did not do any benchmark). Could we create a wrapper on top of the OpenAI API for Docker?