rig icon indicating copy to clipboard operation
rig copied to clipboard

feat: support Docker Model Runner

Open hghalebi opened this issue 3 months ago • 3 comments

Summary

Add Docker Model Runner as a local provider option alongside Ollama in Rig, following the same implementation pattern.

Motivation

Docker Desktop now includes Model Runner (Beta in 4.40+) which provides:

  • Built-in llama.cpp inference engine
  • OpenAI-compatible API at http://localhost:12434/v1
  • Same interface as Ollama but with Docker's ecosystem benefits
  • Native host execution on Apple Silicon for optimal performance

Reference: [Docker Blog: Run LLMs Locally](https://www.docker.com/blog/run-llms-locally/)

Implementation

Following Ollama's pattern, create a minimal Docker provider:

// rig-provider-docker/src/lib.rs
use rig::providers::docker;

const DOCKER_API_BASE_URL: &str = "http://localhost:12434/v1";

pub struct Client {
    base_url: Url,
    http_client: reqwest::Client,
}

impl Client {
    pub fn new() -> Self {
        // Reuse Ollama's structure, just change the base URL
        Self {
            base_url: Url::parse(DOCKER_API_BASE_URL).expect("Valid URL"),
            http_client: reqwest::Client::new(),
        }
    }
}

// Since Docker uses OpenAI-compatible endpoints, we can mostly
// reuse the OpenAI provider implementation instead of Ollama's
impl CompletionModel for DockerModel {
    // Forward to OpenAI-style /v1/completions endpoint
}

Usage

Exactly the same as Ollama:

/// This example requires Docker Desktop with Model Runner enabled.
use rig::prelude::*;
use rig::{completion::Prompt, providers};

#[tokio::main]
async fn main() -> Result<(), anyhow::Error> {
    // Create docker client
    let client = providers::docker::Client::new();
    
    // Create agent with a single context prompt
    let comedian_agent = client
        .agent("qwen2.5:14b")
        .preamble("You are a comedian here to entertain the user using humour and jokes.")
        .build();
    
    // Prompt the agent and print the response
    let response = comedian_agent.prompt("Entertain me!").await?;
    println!("{response}");
    
    Ok(())
}

Key Differences from Ollama

  1. Base URL: http://localhost:12434/v1 (Docker) vs http://localhost:11434 (Ollama)
  2. API Style: OpenAI-compatible vs Ollama's custom format
  3. Model Management: docker model pull vs ollama pull
  4. Model Format: Uses GGUF models packaged as OCI artifacts

Implementation Notes

  • Can largely copy Ollama's client structure
  • Use OpenAI's request/response format instead of Ollama's
  • Environment variable: DOCKER_MODEL_RUNNER_URL (if needed)
  • Same ProviderClient, CompletionClient, EmbeddingsClient traits

Benefits for Rig Users

  • Zero configuration: Works out of the box with Docker Desktop
  • Performance: Native execution avoids VM overhead
  • Familiar tooling: Uses standard Docker commands
  • Future-proof: Docker is expanding Model Runner capabilities

Questions for Maintainers

  1. Should we detect if Docker Model Runner is available at runtime?
  2. Preferred module name: docker or docker_model_runner?
  3. Should we support Docker's model pulling via the API?

References

hghalebi avatar Sep 14 '25 21:09 hghalebi

This seems like it's already compliant with the OpenAI Chat Completions API, is there any specific reason to add this as a separate provider?

I'm all for adding new model providers but we should be wary of adding too many as each new one also adds maintenance overhead

joshua-mo-143 avatar Sep 14 '25 21:09 joshua-mo-143

Good question! I don't have an answer for it. The promise of Docker Model Runner is that its implementation is more optimized for multiple cores, offering a promise of better performance than Ollama or LMStudio. ( which personally I did not do any benchmark). Could we create a wrapper on top of the OpenAI API for Docker?

hghalebi avatar Sep 14 '25 21:09 hghalebi