Add Ollama Support with Local Model Discovery and Embeddings
Summary
This PR adds comprehensive Ollama support to Archon, enabling users to use local LLM models for both chat and embeddings.
Features Added:
- Ollama Integration: Full support for Ollama as LLM and embedding provider
- Model Discovery: Automatic detection of available Ollama models via API
- UI Configuration: New OllamaConfigurationPanel for easy setup in Settings
- RAG Settings Update: Support for Ollama embeddings in RAG pipeline
- API Mode Selection: Native Ollama API or OpenAI-compatible mode
- Documentation: Added INFRASTRUCTURE.md, PLAN.md, QUICKSTART.md
Technical Changes:
python/src/server/api_routes/ollama_api.py- Extended API endpointspython/src/server/services/embeddings/embedding_service.py- Ollama embedding supportpython/src/server/services/llm_provider_service.py- Ollama LLM providerpython/src/server/services/ollama/model_discovery_service.py- Model discoveryarchon-ui-main/src/components/settings/OllamaConfigurationPanel.tsx- UI panelarchon-ui-main/src/components/settings/RAGSettings.tsx- Updated settings
Configuration:
Users can configure Ollama via the Settings page:
- Ollama Base URL (local or remote)
- API Mode (native or OpenAI-compatible)
- Embedding model selection
- Chat model selection
Test plan
- [x] Tested with local Ollama instance
- [x] Tested with remote Ollama server (with auth token)
- [x] Verified model discovery works correctly
- [x] Verified embeddings generation with Ollama models
- [ ] E2E tests included in
archon-ui-main/tests/e2e/ollama-api-mode.spec.ts
🤖 Generated with Claude Code
Summary by CodeRabbit
-
New Features
- Per-instance authentication for Ollama (chat & embeddings), Ollama API mode switch, native Ollama embeddings, and automatic model discovery.
-
Documentation
- Added comprehensive infrastructure guide and a bilingual quick-start.
-
Tests
- New E2E and expanded unit/integration tests covering API modes and auth-token flows.
-
Chores
- Docker services now restart automatically; added Playwright dev dependency; updated ignore rules to exclude Supabase local data and test results.
-
Style
- Removed unused icon imports and minor UI refinements.
✏️ Tip: You can customize this high-level summary in your review settings.
Walkthrough
Adds per-instance Ollama authentication (frontend persistence, UI, and backend propagation), a native Ollama embeddings adapter, auth-aware model discovery and health checks, Supabase infra docs/quickstart, Playwright/e2e and unit tests, Docker restart policies, multi-dimensional search RPCs, and PDF/OCR extraction improvements.
Changes
| Cohort / File(s) | Summary |
|---|---|
Docs & Repo config /.gitignore, archon-ui-main/.gitignore, INFRASTRUCTURE.md, QUICKSTART.md, PLAN.md |
Add Supabase ignore patterns; add INFRASTRUCTURE and QUICKSTART guides; document Ollama auth changes and deployment/health-check notes. |
| Frontend: RAG & Ollama config RAG/Ollama UI & types archon-ui-main/src/components/settings/RAGSettings.tsx, archon-ui-main/src/components/settings/OllamaConfigurationPanel.tsx, archon-ui-main/src/components/settings/types/OllamaTypes.ts, archon-ui-main/src/services/credentialsService.ts |
Add per-instance useAuth/authToken fields, UI controls, persist tokens into ragSettings (OLLAMA_CHAT_AUTH_TOKEN, OLLAMA_EMBEDDING_AUTH_TOKEN), and update related types (including CHAT_MODEL, OLLAMA_API_MODE). |
Frontend: utilities & tests archon-ui-main/src/components/settings/utils/instanceConfigSync.ts, archon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.ts |
New utility syncEmbeddingFromLLM and unit tests covering name/url/useAuth/authToken sync and edge cases. |
Frontend: e2e & build deps archon-ui-main/package.json, archon-ui-main/tests/e2e/ollama-api-mode.spec.ts |
Add @playwright/test devDependency and new Playwright e2e test validating Ollama API mode UI flows and persistence. |
Frontend: minor import cleanup & ignore archon-ui-main/src/components/*, archon-ui-main/.gitignore |
Remove unused lucide-react imports across components; add test-results/ to .gitignore. |
Backend: API routes & token mapping python/src/server/api_routes/ollama_api.py |
Derive per-URL tokens from rag_strategy, normalize instance URLs, map tokens to instances, and pass tokens into discovery/health/validate endpoints with logging. |
Backend: model discovery & health checks python/src/server/services/ollama/model_discovery_service.py |
Add optional auth_token to discover_models/check_instance_health and support an auth_tokens map for multi-instance discovery; attach Authorization header when token provided. |
Backend: LLM provider adjustments python/src/server/services/llm_provider_service.py |
Select Ollama auth token based on operation (chat vs embedding), default to "required-but-ignored" when absent, resolve embedding URL (/v1) as needed, and extend validate_provider_instance to accept auth_token. |
Backend: Embeddings adapter python/src/server/services/embeddings/embedding_service.py |
Add NativeOllamaEmbeddingAdapter for Ollama native /api/embeddings with optional Bearer token; adapter selection respects OLLAMA_API_MODE (native vs OpenAI-compatible). |
Backend: search & storage python/src/server/services/search/*, python/src/server/services/storage/base_storage_service.py |
Switch to multi-dimensional RPCs (pass embedding_dimension) for hybrid search RPCs; improve chunk-splitting logic to avoid breaking code blocks and prefer headings/paragraphs. |
Backend: PDF/OCR processing & deps python/src/server/utils/document_processing.py, python/src/server/utils/ocr_processing.py, python/pyproject.toml |
Add OCR-based extraction (pytesseract/pdf2image) and pymupdf4llm/pdfplumber/PyPDF2 fallbacks; add OCR dependencies to pyproject. |
Backend: tests & minor changes python/tests/*, python/src/agent_work_orders/utils/state_reconciliation.py, python/tests/test_ollama_auth_token.py, python/tests/test_async_llm_provider_service.py |
Add tests for Ollama auth propagation and llm-provider scenarios; whitespace-only reflow in state_reconciliation. |
Docker docker-compose.yml |
Add restart: unless-stopped to archon-server, archon-mcp, archon-agents, and archon-frontend services. |
Sequence Diagram(s)
sequenceDiagram
actor User
participant UI as Frontend (RAGSettings / OllamaConfig)
participant API as Archon API (ollama_api)
participant Discovery as ModelDiscoveryService
participant Ollama as Ollama Instance
User->>UI: Configure Ollama URL + useAuth + authToken
UI->>API: Save ragSettings (includes per-instance tokens)
User->>UI: Trigger health check / discover models
UI->>API: GET /health or /discover with instance URLs
API->>API: Normalize URLs, map tokens from rag_strategy
API->>Discovery: check_instance_health/discover_models(url, auth_token)
Discovery->>Ollama: HTTP request (Authorization: Bearer token if provided)
Ollama-->>Discovery: health/models response
Discovery-->>API: aggregated result
API-->>UI: health/discovery result
UI-->>User: display status and available models
Estimated code review effort
🎯 4 (Complex) | ⏱️ ~60 minutes
- Areas needing extra attention:
python/src/server/services/embeddings/embedding_service.py(NativeOllamaEmbeddingAdapter: concurrency, error mapping, auth handling)python/src/server/services/llm_provider_service.py(token selection between chat vs embedding, fallback behavior)python/src/server/api_routes/ollama_api.py(URL normalization and token mapping correctness)- Frontend token lifecycle and persistence:
RAGSettings.tsx,OllamaConfigurationPanel.tsx,credentialsService.ts - PDF/OCR integration and fallbacks:
python/src/server/utils/document_processing.py,python/src/server/utils/ocr_processing.py
Possibly related PRs
- coleam00/Archon#643 — Overlapping Ollama integration changes (frontend/backend auth token plumbing and model discovery).
- coleam00/Archon#560 — Closely related Ollama integration work touching model discovery, health checks, adapters, and UI settings.
- coleam00/Archon#681 — Related hybrid search multi-dimensional RPC changes and corresponding DB migration work.
Suggested labels
enhancement
Suggested reviewers
- tazmon95
- coleam00
- leex279
Poem
🐰 I found a token and tucked it away,
Rag settings hum with a brighter day.
From checkbox click to a backend ping,
Ollama answers when the bell does ring.
A hopping rabbit cheers—code stitched and gay. 🥕
Pre-merge checks and finishing touches
✅ Passed checks (3 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | ✅ Passed | The title 'Add Ollama Support with Local Model Discovery and Embeddings' accurately and concisely describes the main feature addition in the PR—Ollama integration with model discovery and embedding capabilities. |
| Description check | ✅ Passed | The PR description provides a comprehensive summary with features, technical changes, configuration details, and test plan. It covers the template sections including summary, changes made, type of change (new feature), affected services (frontend, server, database), testing, and additional notes. However, the test evidence section lacks specific command outputs and the testing checklist is only partially checked. |
| Docstring Coverage | ✅ Passed | Docstring coverage is 86.05% which is sufficient. The required threshold is 80.00%. |
✨ Finishing touches
- [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
@coderabbitai review
✅ Actions performed
Review triggered.
Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.
Sorry, I forgot to clean the todo.md. This was just for the implementaion of this ollama-fix /-overhaul. Am 23. Nov. 2025, 20:12 +0100 schrieb:
placement.
This appears to be a retrospective CodeRabbit review tracking document. Please clarify: is this intended to be a committed artifact in the repository
#870
@Milofax, We already supported Ollama and have all the capabilities listed in this PR except for the Ollama API selection capability but don't see a need for this as we work with the native Ollama method and I've never heard of someone using Ollama via the OpenAI specification when the native one is available.
Hey @tazmon95, thanks for the feedback! Let me provide some context on why this PR exists.
To be honest, I wouldn't have touched the Ollama implementation at all if it had worked for my use case. The changes Here weren't about adding features for the sake of it—they came from real issues I encountered:
-
Remote Ollama instances weren't properly supported The original implementation was essentially hardcoded to localhost host.docker.internal. For anyone running Ollama on a separate server (which is common for GPU offloading or shared infrastructure), this didn't work reliably.
-
Auth token support was missing Protected Ollama instances (behind reverse proxies with authentication) had no way to pass credentials through the embedding and health-check flows.
-
The API mode toggle exists because the original code used OpenAI-compatible mode I didn't add this toggle because I wanted it—it's there for backwards compatibility. The existing codebase was calling Ollama via the OpenAI-compatible /v1/embeddings endpoint. We added the native /api/embeddings adapter (which is more reliable for Ollama) but kept the old approach as a fallback option.
If the toggle adds unnecessary complexity and nobody uses the OpenAI-compatible mode for Ollama, I'm totally fine with removing it and just defaulting to native. The core value of this PR is really about making remote/authenticated Ollama setups work properly.
Happy to discuss or simplify where it makes sense! 🙂
Thanks @Milofax ,
-
I've always used Ollama with remote hosts, neve with the host.docker.internal address. This has worked fine since the beginning.
-
This is a good addition, I didn't try it before.
-
Didn't realize that, I must be mistaken then. I'll take a look, it makes more sense to me to use the native method.