Archon Add Ollama Support with Local Model Discovery and Embeddings

Summary

This PR adds comprehensive Ollama support to Archon, enabling users to use local LLM models for both chat and embeddings.

Features Added:

Ollama Integration: Full support for Ollama as LLM and embedding provider
Model Discovery: Automatic detection of available Ollama models via API
UI Configuration: New OllamaConfigurationPanel for easy setup in Settings
RAG Settings Update: Support for Ollama embeddings in RAG pipeline
API Mode Selection: Native Ollama API or OpenAI-compatible mode
Documentation: Added INFRASTRUCTURE.md, PLAN.md, QUICKSTART.md

Technical Changes:

python/src/server/api_routes/ollama_api.py - Extended API endpoints
python/src/server/services/embeddings/embedding_service.py - Ollama embedding support
python/src/server/services/llm_provider_service.py - Ollama LLM provider
python/src/server/services/ollama/model_discovery_service.py - Model discovery
archon-ui-main/src/components/settings/OllamaConfigurationPanel.tsx - UI panel
archon-ui-main/src/components/settings/RAGSettings.tsx - Updated settings

Configuration:

Users can configure Ollama via the Settings page:

Ollama Base URL (local or remote)
API Mode (native or OpenAI-compatible)
Embedding model selection
Chat model selection

Test plan

[x] Tested with local Ollama instance
[x] Tested with remote Ollama server (with auth token)
[x] Verified model discovery works correctly
[x] Verified embeddings generation with Ollama models
[ ] E2E tests included in archon-ui-main/tests/e2e/ollama-api-mode.spec.ts

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Per-instance authentication for Ollama (chat & embeddings), Ollama API mode switch, native Ollama embeddings, and automatic model discovery.
Documentation
- Added comprehensive infrastructure guide and a bilingual quick-start.
Tests
- New E2E and expanded unit/integration tests covering API modes and auth-token flows.
Chores
- Docker services now restart automatically; added Playwright dev dependency; updated ignore rules to exclude Supabase local data and test results.
Style
- Removed unused icon imports and minor UI refinements.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Nov 21 '25 18:11 Milofax

Walkthrough

Adds per-instance Ollama authentication (frontend persistence, UI, and backend propagation), a native Ollama embeddings adapter, auth-aware model discovery and health checks, Supabase infra docs/quickstart, Playwright/e2e and unit tests, Docker restart policies, multi-dimensional search RPCs, and PDF/OCR extraction improvements.

Changes

Cohort / File(s)	Summary
Docs & Repo config `/.gitignore`, `archon-ui-main/.gitignore`, `INFRASTRUCTURE.md`, `QUICKSTART.md`, `PLAN.md`	Add Supabase ignore patterns; add INFRASTRUCTURE and QUICKSTART guides; document Ollama auth changes and deployment/health-check notes.
Frontend: RAG & Ollama config RAG/Ollama UI & types `archon-ui-main/src/components/settings/RAGSettings.tsx`, `archon-ui-main/src/components/settings/OllamaConfigurationPanel.tsx`, `archon-ui-main/src/components/settings/types/OllamaTypes.ts`, `archon-ui-main/src/services/credentialsService.ts`	Add per-instance `useAuth`/`authToken` fields, UI controls, persist tokens into ragSettings (OLLAMA_CHAT_AUTH_TOKEN, OLLAMA_EMBEDDING_AUTH_TOKEN), and update related types (including `CHAT_MODEL`, `OLLAMA_API_MODE`).
Frontend: utilities & tests `archon-ui-main/src/components/settings/utils/instanceConfigSync.ts`, `archon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.ts`	New utility `syncEmbeddingFromLLM` and unit tests covering name/url/useAuth/authToken sync and edge cases.
Frontend: e2e & build deps `archon-ui-main/package.json`, `archon-ui-main/tests/e2e/ollama-api-mode.spec.ts`	Add `@playwright/test` devDependency and new Playwright e2e test validating Ollama API mode UI flows and persistence.
Frontend: minor import cleanup & ignore `archon-ui-main/src/components/*`, `archon-ui-main/.gitignore`	Remove unused `lucide-react` imports across components; add `test-results/` to `.gitignore`.
Backend: API routes & token mapping `python/src/server/api_routes/ollama_api.py`	Derive per-URL tokens from rag_strategy, normalize instance URLs, map tokens to instances, and pass tokens into discovery/health/validate endpoints with logging.
Backend: model discovery & health checks `python/src/server/services/ollama/model_discovery_service.py`	Add optional `auth_token` to discover_models/check_instance_health and support an `auth_tokens` map for multi-instance discovery; attach Authorization header when token provided.
Backend: LLM provider adjustments `python/src/server/services/llm_provider_service.py`	Select Ollama auth token based on operation (chat vs embedding), default to `"required-but-ignored"` when absent, resolve embedding URL (`/v1`) as needed, and extend `validate_provider_instance` to accept `auth_token`.
Backend: Embeddings adapter `python/src/server/services/embeddings/embedding_service.py`	Add `NativeOllamaEmbeddingAdapter` for Ollama native `/api/embeddings` with optional Bearer token; adapter selection respects `OLLAMA_API_MODE` (native vs OpenAI-compatible).
Backend: search & storage `python/src/server/services/search/*`, `python/src/server/services/storage/base_storage_service.py`	Switch to multi-dimensional RPCs (pass `embedding_dimension`) for hybrid search RPCs; improve chunk-splitting logic to avoid breaking code blocks and prefer headings/paragraphs.
Backend: PDF/OCR processing & deps `python/src/server/utils/document_processing.py`, `python/src/server/utils/ocr_processing.py`, `python/pyproject.toml`	Add OCR-based extraction (pytesseract/pdf2image) and pymupdf4llm/pdfplumber/PyPDF2 fallbacks; add OCR dependencies to pyproject.
Backend: tests & minor changes `python/tests/*`, `python/src/agent_work_orders/utils/state_reconciliation.py`, `python/tests/test_ollama_auth_token.py`, `python/tests/test_async_llm_provider_service.py`	Add tests for Ollama auth propagation and llm-provider scenarios; whitespace-only reflow in state_reconciliation.
Docker `docker-compose.yml`	Add `restart: unless-stopped` to archon-server, archon-mcp, archon-agents, and archon-frontend services.

Sequence Diagram(s)

sequenceDiagram
  actor User
  participant UI as Frontend (RAGSettings / OllamaConfig)
  participant API as Archon API (ollama_api)
  participant Discovery as ModelDiscoveryService
  participant Ollama as Ollama Instance

  User->>UI: Configure Ollama URL + useAuth + authToken
  UI->>API: Save ragSettings (includes per-instance tokens)
  User->>UI: Trigger health check / discover models
  UI->>API: GET /health or /discover with instance URLs
  API->>API: Normalize URLs, map tokens from rag_strategy
  API->>Discovery: check_instance_health/discover_models(url, auth_token)
  Discovery->>Ollama: HTTP request (Authorization: Bearer token if provided)
  Ollama-->>Discovery: health/models response
  Discovery-->>API: aggregated result
  API-->>UI: health/discovery result
  UI-->>User: display status and available models

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas needing extra attention:
- python/src/server/services/embeddings/embedding_service.py (NativeOllamaEmbeddingAdapter: concurrency, error mapping, auth handling)
- python/src/server/services/llm_provider_service.py (token selection between chat vs embedding, fallback behavior)
- python/src/server/api_routes/ollama_api.py (URL normalization and token mapping correctness)
- Frontend token lifecycle and persistence: RAGSettings.tsx, OllamaConfigurationPanel.tsx, credentialsService.ts
- PDF/OCR integration and fallbacks: python/src/server/utils/document_processing.py, python/src/server/utils/ocr_processing.py

Possibly related PRs

coleam00/Archon#643 — Overlapping Ollama integration changes (frontend/backend auth token plumbing and model discovery).
coleam00/Archon#560 — Closely related Ollama integration work touching model discovery, health checks, adapters, and UI settings.
coleam00/Archon#681 — Related hybrid search multi-dimensional RPC changes and corresponding DB migration work.

Suggested labels

enhancement

Suggested reviewers

tazmon95
coleam00
leex279

Poem

🐰 I found a token and tucked it away,

Rag settings hum with a brighter day.
From checkbox click to a backend ping,
Ollama answers when the bell does ring.
A hopping rabbit cheers—code stitched and gay. 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Add Ollama Support with Local Model Discovery and Embeddings' accurately and concisely describes the main feature addition in the PR—Ollama integration with model discovery and embedding capabilities.
Description check	✅ Passed	The PR description provides a comprehensive summary with features, technical changes, configuration details, and test plan. It covers the template sections including summary, changes made, type of change (new feature), affected services (frontend, server, database), testing, and additional notes. However, the test evidence section lacks specific command outputs and the testing checklist is only partially checked.
Docstring Coverage	✅ Passed	Docstring coverage is 86.05% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

[ ] 📝 Generate docstrings

🧪 Generate unit tests (beta)

[ ] Create PR with unit tests
[ ] Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Nov 21 '25 18:11 coderabbitai[bot]

@coderabbitai review

Nov 21 '25 20:11 Milofax

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Nov 21 '25 20:11 coderabbitai[bot]

Sorry, I forgot to clean the todo.md. This was just for the implementaion of this ollama-fix /-overhaul. Am 23. Nov. 2025, 20:12 +0100 schrieb:

placement.

This appears to be a retrospective CodeRabbit review tracking document. Please clarify: is this intended to be a committed artifact in the repository

Nov 23 '25 19:11 Milofax

#870

Nov 24 '25 09:11 Wirasm

@Milofax, We already supported Ollama and have all the capabilities listed in this PR except for the Ollama API selection capability but don't see a need for this as we work with the native Ollama method and I've never heard of someone using Ollama via the OpenAI specification when the native one is available.

Dec 03 '25 16:12 tazmon95

Hey @tazmon95, thanks for the feedback! Let me provide some context on why this PR exists.

To be honest, I wouldn't have touched the Ollama implementation at all if it had worked for my use case. The changes Here weren't about adding features for the sake of it—they came from real issues I encountered:

Remote Ollama instances weren't properly supported The original implementation was essentially hardcoded to localhost host.docker.internal. For anyone running Ollama on a separate server (which is common for GPU offloading or shared infrastructure), this didn't work reliably.
Auth token support was missing Protected Ollama instances (behind reverse proxies with authentication) had no way to pass credentials through the embedding and health-check flows.
The API mode toggle exists because the original code used OpenAI-compatible mode I didn't add this toggle because I wanted it—it's there for backwards compatibility. The existing codebase was calling Ollama via the OpenAI-compatible /v1/embeddings endpoint. We added the native /api/embeddings adapter (which is more reliable for Ollama) but kept the old approach as a fallback option.

If the toggle adds unnecessary complexity and nobody uses the OpenAI-compatible mode for Ollama, I'm totally fine with removing it and just defaulting to native. The core value of this PR is really about making remote/authenticated Ollama setups work properly.

Happy to discuss or simplify where it makes sense! 🙂

Dec 03 '25 17:12 Milofax

Thanks @Milofax ,

I've always used Ollama with remote hosts, neve with the host.docker.internal address. This has worked fine since the beginning.
This is a good addition, I didn't try it before.
Didn't realize that, I must be mistaken then. I'll take a look, it makes more sense to me to use the native method.

Dec 03 '25 20:12 tazmon95