Archon icon indicating copy to clipboard operation
Archon copied to clipboard

Add Ollama Support with Local Model Discovery and Embeddings

Open Milofax opened this issue 1 month ago • 5 comments

Summary

This PR adds comprehensive Ollama support to Archon, enabling users to use local LLM models for both chat and embeddings.

Features Added:

  • Ollama Integration: Full support for Ollama as LLM and embedding provider
  • Model Discovery: Automatic detection of available Ollama models via API
  • UI Configuration: New OllamaConfigurationPanel for easy setup in Settings
  • RAG Settings Update: Support for Ollama embeddings in RAG pipeline
  • API Mode Selection: Native Ollama API or OpenAI-compatible mode
  • Documentation: Added INFRASTRUCTURE.md, PLAN.md, QUICKSTART.md

Technical Changes:

  • python/src/server/api_routes/ollama_api.py - Extended API endpoints
  • python/src/server/services/embeddings/embedding_service.py - Ollama embedding support
  • python/src/server/services/llm_provider_service.py - Ollama LLM provider
  • python/src/server/services/ollama/model_discovery_service.py - Model discovery
  • archon-ui-main/src/components/settings/OllamaConfigurationPanel.tsx - UI panel
  • archon-ui-main/src/components/settings/RAGSettings.tsx - Updated settings

Configuration:

Users can configure Ollama via the Settings page:

  • Ollama Base URL (local or remote)
  • API Mode (native or OpenAI-compatible)
  • Embedding model selection
  • Chat model selection

Test plan

  • [x] Tested with local Ollama instance
  • [x] Tested with remote Ollama server (with auth token)
  • [x] Verified model discovery works correctly
  • [x] Verified embeddings generation with Ollama models
  • [ ] E2E tests included in archon-ui-main/tests/e2e/ollama-api-mode.spec.ts

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Per-instance authentication for Ollama (chat & embeddings), Ollama API mode switch, native Ollama embeddings, and automatic model discovery.
  • Documentation

    • Added comprehensive infrastructure guide and a bilingual quick-start.
  • Tests

    • New E2E and expanded unit/integration tests covering API modes and auth-token flows.
  • Chores

    • Docker services now restart automatically; added Playwright dev dependency; updated ignore rules to exclude Supabase local data and test results.
  • Style

    • Removed unused icon imports and minor UI refinements.

✏️ Tip: You can customize this high-level summary in your review settings.

Milofax avatar Nov 21 '25 18:11 Milofax

Walkthrough

Adds per-instance Ollama authentication (frontend persistence, UI, and backend propagation), a native Ollama embeddings adapter, auth-aware model discovery and health checks, Supabase infra docs/quickstart, Playwright/e2e and unit tests, Docker restart policies, multi-dimensional search RPCs, and PDF/OCR extraction improvements.

Changes

Cohort / File(s) Summary
Docs & Repo config
/.gitignore, archon-ui-main/.gitignore, INFRASTRUCTURE.md, QUICKSTART.md, PLAN.md
Add Supabase ignore patterns; add INFRASTRUCTURE and QUICKSTART guides; document Ollama auth changes and deployment/health-check notes.
Frontend: RAG & Ollama config
RAG/Ollama UI & types
archon-ui-main/src/components/settings/RAGSettings.tsx, archon-ui-main/src/components/settings/OllamaConfigurationPanel.tsx, archon-ui-main/src/components/settings/types/OllamaTypes.ts, archon-ui-main/src/services/credentialsService.ts
Add per-instance useAuth/authToken fields, UI controls, persist tokens into ragSettings (OLLAMA_CHAT_AUTH_TOKEN, OLLAMA_EMBEDDING_AUTH_TOKEN), and update related types (including CHAT_MODEL, OLLAMA_API_MODE).
Frontend: utilities & tests
archon-ui-main/src/components/settings/utils/instanceConfigSync.ts, archon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.ts
New utility syncEmbeddingFromLLM and unit tests covering name/url/useAuth/authToken sync and edge cases.
Frontend: e2e & build deps
archon-ui-main/package.json, archon-ui-main/tests/e2e/ollama-api-mode.spec.ts
Add @playwright/test devDependency and new Playwright e2e test validating Ollama API mode UI flows and persistence.
Frontend: minor import cleanup & ignore
archon-ui-main/src/components/*, archon-ui-main/.gitignore
Remove unused lucide-react imports across components; add test-results/ to .gitignore.
Backend: API routes & token mapping
python/src/server/api_routes/ollama_api.py
Derive per-URL tokens from rag_strategy, normalize instance URLs, map tokens to instances, and pass tokens into discovery/health/validate endpoints with logging.
Backend: model discovery & health checks
python/src/server/services/ollama/model_discovery_service.py
Add optional auth_token to discover_models/check_instance_health and support an auth_tokens map for multi-instance discovery; attach Authorization header when token provided.
Backend: LLM provider adjustments
python/src/server/services/llm_provider_service.py
Select Ollama auth token based on operation (chat vs embedding), default to "required-but-ignored" when absent, resolve embedding URL (/v1) as needed, and extend validate_provider_instance to accept auth_token.
Backend: Embeddings adapter
python/src/server/services/embeddings/embedding_service.py
Add NativeOllamaEmbeddingAdapter for Ollama native /api/embeddings with optional Bearer token; adapter selection respects OLLAMA_API_MODE (native vs OpenAI-compatible).
Backend: search & storage
python/src/server/services/search/*, python/src/server/services/storage/base_storage_service.py
Switch to multi-dimensional RPCs (pass embedding_dimension) for hybrid search RPCs; improve chunk-splitting logic to avoid breaking code blocks and prefer headings/paragraphs.
Backend: PDF/OCR processing & deps
python/src/server/utils/document_processing.py, python/src/server/utils/ocr_processing.py, python/pyproject.toml
Add OCR-based extraction (pytesseract/pdf2image) and pymupdf4llm/pdfplumber/PyPDF2 fallbacks; add OCR dependencies to pyproject.
Backend: tests & minor changes
python/tests/*, python/src/agent_work_orders/utils/state_reconciliation.py, python/tests/test_ollama_auth_token.py, python/tests/test_async_llm_provider_service.py
Add tests for Ollama auth propagation and llm-provider scenarios; whitespace-only reflow in state_reconciliation.
Docker
docker-compose.yml
Add restart: unless-stopped to archon-server, archon-mcp, archon-agents, and archon-frontend services.

Sequence Diagram(s)

sequenceDiagram
  actor User
  participant UI as Frontend (RAGSettings / OllamaConfig)
  participant API as Archon API (ollama_api)
  participant Discovery as ModelDiscoveryService
  participant Ollama as Ollama Instance

  User->>UI: Configure Ollama URL + useAuth + authToken
  UI->>API: Save ragSettings (includes per-instance tokens)
  User->>UI: Trigger health check / discover models
  UI->>API: GET /health or /discover with instance URLs
  API->>API: Normalize URLs, map tokens from rag_strategy
  API->>Discovery: check_instance_health/discover_models(url, auth_token)
  Discovery->>Ollama: HTTP request (Authorization: Bearer token if provided)
  Ollama-->>Discovery: health/models response
  Discovery-->>API: aggregated result
  API-->>UI: health/discovery result
  UI-->>User: display status and available models

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Areas needing extra attention:
    • python/src/server/services/embeddings/embedding_service.py (NativeOllamaEmbeddingAdapter: concurrency, error mapping, auth handling)
    • python/src/server/services/llm_provider_service.py (token selection between chat vs embedding, fallback behavior)
    • python/src/server/api_routes/ollama_api.py (URL normalization and token mapping correctness)
    • Frontend token lifecycle and persistence: RAGSettings.tsx, OllamaConfigurationPanel.tsx, credentialsService.ts
    • PDF/OCR integration and fallbacks: python/src/server/utils/document_processing.py, python/src/server/utils/ocr_processing.py

Possibly related PRs

  • coleam00/Archon#643 — Overlapping Ollama integration changes (frontend/backend auth token plumbing and model discovery).
  • coleam00/Archon#560 — Closely related Ollama integration work touching model discovery, health checks, adapters, and UI settings.
  • coleam00/Archon#681 — Related hybrid search multi-dimensional RPC changes and corresponding DB migration work.

Suggested labels

enhancement

Suggested reviewers

  • tazmon95
  • coleam00
  • leex279

Poem

🐰 I found a token and tucked it away,

Rag settings hum with a brighter day.
From checkbox click to a backend ping,
Ollama answers when the bell does ring.
A hopping rabbit cheers—code stitched and gay. 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add Ollama Support with Local Model Discovery and Embeddings' accurately and concisely describes the main feature addition in the PR—Ollama integration with model discovery and embedding capabilities.
Description check ✅ Passed The PR description provides a comprehensive summary with features, technical changes, configuration details, and test plan. It covers the template sections including summary, changes made, type of change (new feature), affected services (frontend, server, database), testing, and additional notes. However, the test evidence section lacks specific command outputs and the testing checklist is only partially checked.
Docstring Coverage ✅ Passed Docstring coverage is 86.05% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Nov 21 '25 18:11 coderabbitai[bot]

@coderabbitai review

Milofax avatar Nov 21 '25 20:11 Milofax

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai[bot] avatar Nov 21 '25 20:11 coderabbitai[bot]

Sorry, I forgot to clean the todo.md. This was just for the implementaion of this ollama-fix /-overhaul. Am 23. Nov. 2025, 20:12 +0100 schrieb:

placement.

This appears to be a retrospective CodeRabbit review tracking document. Please clarify: is this intended to be a committed artifact in the repository

Milofax avatar Nov 23 '25 19:11 Milofax

#870

Wirasm avatar Nov 24 '25 09:11 Wirasm

@Milofax, We already supported Ollama and have all the capabilities listed in this PR except for the Ollama API selection capability but don't see a need for this as we work with the native Ollama method and I've never heard of someone using Ollama via the OpenAI specification when the native one is available.

tazmon95 avatar Dec 03 '25 16:12 tazmon95

Hey @tazmon95, thanks for the feedback! Let me provide some context on why this PR exists.

To be honest, I wouldn't have touched the Ollama implementation at all if it had worked for my use case. The changes Here weren't about adding features for the sake of it—they came from real issues I encountered:

  1. Remote Ollama instances weren't properly supported The original implementation was essentially hardcoded to localhost host.docker.internal. For anyone running Ollama on a separate server (which is common for GPU offloading or shared infrastructure), this didn't work reliably.

  2. Auth token support was missing Protected Ollama instances (behind reverse proxies with authentication) had no way to pass credentials through the embedding and health-check flows.

  3. The API mode toggle exists because the original code used OpenAI-compatible mode I didn't add this toggle because I wanted it—it's there for backwards compatibility. The existing codebase was calling Ollama via the OpenAI-compatible /v1/embeddings endpoint. We added the native /api/embeddings adapter (which is more reliable for Ollama) but kept the old approach as a fallback option.

If the toggle adds unnecessary complexity and nobody uses the OpenAI-compatible mode for Ollama, I'm totally fine with removing it and just defaulting to native. The core value of this PR is really about making remote/authenticated Ollama setups work properly.

Happy to discuss or simplify where it makes sense! 🙂

Milofax avatar Dec 03 '25 17:12 Milofax

Thanks @Milofax ,

  1. I've always used Ollama with remote hosts, neve with the host.docker.internal address. This has worked fine since the beginning.

  2. This is a good addition, I didn't try it before.

  3. Didn't realize that, I must be mistaken then. I'll take a look, it makes more sense to me to use the native method.

tazmon95 avatar Dec 03 '25 20:12 tazmon95