rig feat: support Docker Model Runner

Summary

Add Docker Model Runner as a local provider option alongside Ollama in Rig, following the same implementation pattern.

Motivation

Docker Desktop now includes Model Runner (Beta in 4.40+) which provides:

Built-in llama.cpp inference engine
OpenAI-compatible API at http://localhost:12434/v1
Same interface as Ollama but with Docker's ecosystem benefits
Native host execution on Apple Silicon for optimal performance

Reference: [Docker Blog: Run LLMs Locally](https://www.docker.com/blog/run-llms-locally/)

Implementation

Following Ollama's pattern, create a minimal Docker provider:

// rig-provider-docker/src/lib.rs
use rig::providers::docker;

const DOCKER_API_BASE_URL: &str = "http://localhost:12434/v1";

pub struct Client {
    base_url: Url,
    http_client: reqwest::Client,
}

impl Client {
    pub fn new() -> Self {
        // Reuse Ollama's structure, just change the base URL
        Self {
            base_url: Url::parse(DOCKER_API_BASE_URL).expect("Valid URL"),
            http_client: reqwest::Client::new(),
        }
    }
}

// Since Docker uses OpenAI-compatible endpoints, we can mostly
// reuse the OpenAI provider implementation instead of Ollama's
impl CompletionModel for DockerModel {
    // Forward to OpenAI-style /v1/completions endpoint
}

Usage

Exactly the same as Ollama:

/// This example requires Docker Desktop with Model Runner enabled.
use rig::prelude::*;
use rig::{completion::Prompt, providers};

#[tokio::main]
async fn main() -> Result<(), anyhow::Error> {
    // Create docker client
    let client = providers::docker::Client::new();
    
    // Create agent with a single context prompt
    let comedian_agent = client
        .agent("qwen2.5:14b")
        .preamble("You are a comedian here to entertain the user using humour and jokes.")
        .build();
    
    // Prompt the agent and print the response
    let response = comedian_agent.prompt("Entertain me!").await?;
    println!("{response}");
    
    Ok(())
}

Key Differences from Ollama

Base URL: http://localhost:12434/v1 (Docker) vs http://localhost:11434 (Ollama)
API Style: OpenAI-compatible vs Ollama's custom format
Model Management: docker model pull vs ollama pull
Model Format: Uses GGUF models packaged as OCI artifacts

Implementation Notes

Can largely copy Ollama's client structure
Use OpenAI's request/response format instead of Ollama's
Environment variable: DOCKER_MODEL_RUNNER_URL (if needed)
Same ProviderClient, CompletionClient, EmbeddingsClient traits

Benefits for Rig Users

Zero configuration: Works out of the box with Docker Desktop
Performance: Native execution avoids VM overhead
Familiar tooling: Uses standard Docker commands
Future-proof: Docker is expanding Model Runner capabilities

Questions for Maintainers

Should we detect if Docker Model Runner is available at runtime?
Preferred module name: docker or docker_model_runner?
Should we support Docker's model pulling via the API?

References

[Docker Blog: Run LLMs Locally](https://www.docker.com/blog/run-llms-locally/)
[Docker Model Runner Documentation](https://docs.docker.com/desktop/model-runner/)

Sep 14 '25 21:09 hghalebi

RIG-942 feat: # Feature Request:

Sep 14 '25 21:09 linear[bot]

This seems like it's already compliant with the OpenAI Chat Completions API, is there any specific reason to add this as a separate provider?

I'm all for adding new model providers but we should be wary of adding too many as each new one also adds maintenance overhead

Sep 14 '25 21:09 joshua-mo-143

Good question! I don't have an answer for it. The promise of Docker Model Runner is that its implementation is more optimized for multiple cores, offering a promise of better performance than Ollama or LMStudio. ( which personally I did not do any benchmark). Could we create a wrapper on top of the OpenAI API for Docker?

Sep 14 '25 21:09 hghalebi