🐛 [Bug]:
Archon Version
0.1.0
Bug Severity
🟠 High - Blocks important features
Bug Description
Excited about this project and have spent a few weeks trying to get it setup and integrated with my AI tools on my Synology :-)
The goal is to use it with Augment Code for better coding.
So I was trying to scrape the following site but I can't make it scrape code examples. I have added an openAI API key. Is this format not supported? Or is my install broken? https://developers.3commas.io/
I also can't scrape Swagger UI sites like this: https://litellm-api.up.railway.app/
So far with all the 9 sites I have added I haven't been able to extract any code examples.
Am i missing anything and what exactly should it support?
Whenever I try scraping it just says Crawling completed and I see no way t access the data. The MCP claims there are no code examples for anything.
I do not see any logs that tell me anything.
Steps to Reproduce
- Go to Knowledge Base page 2. Click "Add Knowledge"
- Enter URL: https://developers.3commas.io/
- Click "Add Source"
- It crawls the site succesfully but skips everything having to do with code examples.
Expected Behavior
That it succesfully extracts code examples.
Actual Behavior
No code examples extracted.
Error Details (if any)
Full crawl/status logs and error report
Below are the complete container logs, filtered to the two sources you asked for:
“3 e-commerce” = 3Commas developers
“light LLM” = LiteLLM docs
I included the full crawl sequence, extraction, storage phases, and all error lines relevant to each.
LiteLLM docs (https://docs.litellm.ai/, source_id=b124a988b930da30)
Progress/Crawl ID seen: ddeb9524-2e27-4e34-b239-f3b7b4db410f (earlier crawl)
Subsequent refresh (today) also started: 22d9356b-2a49-4171-8531-0e3001610574
Logs:
===== LiteLLM (docs.litellm.ai) full crawl log =====
2025-09-17 18:57:11 | root | INFO | Starting knowledge item crawl | url=https://docs.litellm.ai/ | knowledge_type=technical | tags=[]
2025-09-17 18:57:11 | root | INFO | Crawl started successfully | progress_id=ddeb9524-2e27-4e34-b239-f3b7b4db410f | url=https://docs.litellm.ai/
2025-09-17 18:57:12 | root | INFO | Acquired crawl semaphore | progress_id=ddeb9524-2e27-4e34-b239-f3b7b4db410f | url=https://docs.litellm.ai/
2025-09-17 18:57:12 | root | INFO | Starting crawl with progress tracking | progress_id=ddeb9524-2e27-4e34-b239-f3b7b4db410f | url=https://docs.litellm.ai/
2025-09-17 18:57:12 | root | INFO | Starting background crawl orchestration | url=https://docs.litellm.ai/
2025-09-17 18:57:12 | root | INFO | Starting async crawl orchestration | url=https://docs.litellm.ai/ | task_id=ddeb9524-2e27-4e34-b239-f3b7b4db410f
2025-09-17 18:57:12 | root | INFO | Generated unique source_id 'b124a988b930da30' and display name 'Litellm Documentation' from URL 'https://docs.litellm.ai/'
[FETCH]... ↓ https://docs.litellm.ai/ | ✓ | ⏱: 9.08s
[SCRAPE].. ◆ https://docs.litellm.ai/ | ✓ | ⏱: 0.82s
[COMPLETE] ● https://docs.litellm.ai/ | ✓ | ⏱: 9.90s
[FETCH]... ↓ https://docs.litellm.ai/docs/proxy/users | ✓ | ⏱: 36.12s
[SCRAPE].. ◆ https://docs.litellm.ai/docs/proxy/users | ✓ | ⏱: 4.58s
[COMPLETE] ● https://docs.litellm.ai/docs/proxy/users | ✓ | ⏱: 40.72s
[FETCH]... ↓ https://docs.litellm.ai/docs/integrations | ✓ | ⏱: 44.71s
[SCRAPE].. ◆ https://docs.litellm.ai/docs/integrations | ✓ | ⏱: 0.28s
[COMPLETE] ● https://docs.litellm.ai/docs/integrations | ✓ | ⏱: 45.00s
[FETCH]... ↓ https://docs.litellm.ai/docs/completion/output | ✓ | ⏱: 48.00s
[SCRAPE].. ◆ https://docs.litellm.ai/docs/completion/output | ✓ | ⏱: 2.94s
[COMPLETE] ● https://docs.litellm.ai/docs/completion/output | ✓ | ⏱: 50.95s
[FETCH]... ↓ https://docs.litellm.ai/docs/observability/callbacks | ✓ | ⏱: 52.31s
[SCRAPE].. ◆ https://docs.litellm.ai/docs/observability/callbacks | ✓ | ⏱: 1.84s
[COMPLETE] ● https://docs.litellm.ai/docs/observability/callbacks | ✓ | ⏱: 54.17s
[FETCH]... ↓ https://docs.litellm.ai/docs/enterprise | ✓ | ⏱: 61.15s
[SCRAPE].. ◆ https://docs.litellm.ai/docs/enterprise | ✓ | ⏱: 1.52s
[COMPLETE] ● https://docs.litellm.ai/docs/enterprise | ✓ | ⏱: 62.69s
[FETCH]... ↓ https://docs.litellm.ai/docs/exception_mapping | ✓ | ⏱: 68.26s
[SCRAPE].. ◆ https://docs.litellm.ai/docs/exception_mapping | ✓ | ⏱: 2.02s
[COMPLETE] ● https://docs.litellm.ai/docs/exception_mapping | ✓ | ⏱: 70.28s
[FETCH]... ↓ https://docs.litellm.ai/docs/observability/custom_callback | ✓ | ⏱: 74.02s
[SCRAPE].. ◆ https://docs.litellm.ai/docs/observability/custom_callback | ✓ | ⏱: 3.37s
[COMPLETE] ● https://docs.litellm.ai/docs/observability/custom_callback | ✓ | ⏱: 77.41s
[FETCH]... ↓ https://docs.litellm.ai/docs/proxy/docker_quick_start | ✓ | ⏱: 82.44s
[SCRAPE].. ◆ https://docs.litellm.ai/docs/proxy/docker_quick_start | ✓ | ⏱: 1.93s
[COMPLETE] ● https://docs.litellm.ai/docs/proxy/docker_quick_start | ✓ | ⏱: 84.38s
[FETCH]... ↓ https://docs.litellm.ai/docs/ | ✓ | ⏱: 85.60s
[SCRAPE].. ◆ https://docs.litellm.ai/docs/ | ✓ | ⏱: 4.01s
[COMPLETE] ● https://docs.litellm.ai/docs/ | ✓ | ⏱: 89.63s
[FETCH]... ↓ https://docs.litellm.ai/proxy/docker_quick_start.md | ✓ | ⏱: 39.78s
[SCRAPE].. ◆ https://docs.litellm.ai/proxy/docker_quick_start.md | ✓ | ⏱: 0.17s
[COMPLETE] ● https://docs.litellm.ai/proxy/docker_quick_start.md | ✓ | ⏱: 39.96s
[ERROR]... × https://docs.litellm.ai/release_notes | Error:
2025-09-17 18:59:12 | recursive | WARNING | Failed to crawl https://docs.litellm.ai/release_notes: Unexpected error in _crawl_web …
2025-09-17 18:59:43 | root | INFO | Found 1 unique source_ids: ['b124a988b930da30']
2025-09-17 18:59:43 | search | INFO | Generating summary for b124a988b930da30 using model: gpt-4.1-nano
2025-09-17 18:59:53 | root | INFO | Code extraction starting | url_to_full_document=16 | source_id=b124a988b930da30
… many “Document content check”, “HTML extraction”, “Found X code blocks from HTML/markdown” lines …
2025-09-17 19:01:07 | httpx | INFO | DELETE archon_code_examples?url=… (cleanup before insert) OK
2025-09-17 19:01:17 | search | ERROR | Failed to insert individual record for URL https://docs.litellm.ai/: {'message': "Could not find the 'embedding' column of 'archon_code_examples' in the schema cache", 'code': 'PGRST204'}
… repeated PGRST204 errors for many LiteLLM URLs …
2025-09-17 19:24:54 | root | INFO | Code example counts: … 'b124a988b930da30': 0
Key LiteLLM errors observed:
PostgREST schema cache mismatch: “Could not find the 'embedding' column of 'archon_code_examples'” (PGRST204). This is a Supabase/PostgREST cache invalidation issue after schema changes (the table has the column; PostgREST cached schema doesn’t).
Earlier runs used invalid chat model name (“gpt-4.1-nano”) before I fixed MODEL_CHOICE to gpt-4o-mini. The current refresh is running under corrected settings.
Status now (retrieval):
GET /api/knowledge-items/b124a988b930da30/code-examples → count: 0
MCP search for LiteLLM examples → 0 results (expected until embedding insert succeeds)
3Commas developers (https://developers.3commas.io/, source_id=14ddb57c217262f2)
Progress/Crawl IDs observed: 156fdd08-debf-4b6c-a677-bd337500a1d5 (crawl), and refresh ID 39ce3ac0-4046-48bf-adce-b3791d95c388
Logs:
===== 3Commas (developers.3commas.io) full crawl log =====
2025-09-17 19:12:22 | root | INFO | Acquired crawl semaphore for refresh | source_id=14ddb57c217262f2
2025-09-17 19:12:24 | root | INFO | Starting async crawl orchestration | url=https://developers.3commas.io/ | task_id=156fdd08-debf-4b6c-a677-bd337500a1d5
[FETCH]/[SCRAPE]/[COMPLETE] lines for multiple pages (/, dca-bot, search, grid-bot, account, market-data, quick-start, simple-trading, smart-trade, etc.) all OK
2025-09-17 19:15:26 | root | INFO | Found 1 unique source_ids: ['14ddb57c217262f2']
2025-09-17 19:15:26 | search | INFO | Generating summary for 14ddb57c217262f2 using model: gpt-4.1-nano
2025-09-17 19:15:39 | search | INFO | Updated source 14ddb57c217262f2 … (metadata preserved)
2025-09-17 19:15:39 | httpx | DELETE archon_crawled_pages?url=in.(…) OK (cleanup before re-insert)
2025-09-17 19:15:54 | search | WARNING | Error inserting batch (attempt 1/3): duplicate key value violates unique constraint "archon_crawled_pages_url_chunk_number_key"
2025-09-17 19:15:57 | search | WARNING | Error inserting batch (attempt 2/3): duplicate key …
2025-09-17 19:15:59 | search | ERROR | Failed to insert batch after 3 attempts: duplicate key …
2025-09-17 19:16:00 | search | ERROR | Direct PostgREST insert failed: 409 - duplicate key …
2025-09-17 19:16:00 | search | ERROR | Failed individual insert for various URLs: duplicate key value violates unique constraint "archon_crawled_pages_url_chunk_number_key"
2025-09-17 19:16:00 | root | INFO | Code extraction starting | url_to_full_document=15 | source_id=14ddb57c217262f2
… many “Document content check” lines, but notably “has_backticks=False” on pages (few/no code blocks) …
2025-09-17 19:26:59 | recursive | WARNING | Failed to crawl https://developers.3commas.io/: maximum recursion depth exceeded
2025-09-17 19:26:59 | root | INFO | Direct fetch via aiohttp succeeded | url=https://developers.3commas.io/ | html_len=38801
2025-09-17 19:27:03 | root | INFO | Code example counts: … '14ddb57c217262f2': 0
… ongoing periodic checks show counts remain 0 …
Key 3Commas errors observed:
Duplicate key violations inserting archon_crawled_pages chunks (url, chunk_number) for existing rows.
Recrawl encountered “maximum recursion depth exceeded” once; fallback direct fetch succeeded.
Pages show no markdown backticks and HTML extraction didn’t yield code blocks; thus 0 traditional examples. LLM-based synthesis should run when extraction yields none, but earlier the pipeline was blocked by model misconfig; now fixed.
Current progress checks (now)
Direct API
LiteLLM docs: GET /api/knowledge-items/b124a988b930da30/code-examples → count: 0
3Commas: GET /api/knowledge-items/14ddb57c217262f2/code-examples → count: 0
MCP search (code-examples)
LiteLLM “chat completions curl”: 0 results
3Commas “smart trade create curl”: 0 results
These “0” results are consistent with the errors above (LiteLLM storage blocked by PostgREST schema cache; 3Commas has duplicates on page storage and no traditional code blocks; synthesis not stored yet).
Affected Component
🔍 Knowledge Base / RAG
Browser & OS
Brave, newest version, but happens in any browser. The server itself is hosted on a Synology running DSM 7.2.1
Additional Context
No response
Service Status (check all that are working)
- [x] 🖥️ Frontend UI (http://localhost:3737)
- [x] ⚙️ Main Server (http://localhost:8181)
- [x] 🔗 MCP Service (localhost:8051)
- [x] 🤖 Agents Service (http://localhost:8052)
- [x] 💾 Supabase Database (connected)
“Could not find the 'embedding' column of 'archon_code_examples'”
Have you updated your supabase database tables with the migration script? It seems like the database isn't setup properly or isn't up-to-date with the requirements of your Archon version.
I also can't scrape Swagger UI sites like this: https://litellm-api.up.railway.app/
I suppose this is mainly because the site does everything lazy-loaded. When you inspect the source of the page, there are no code blocks. Only after opening an endpoint 'accordion item' you see the code appearing in the source.
Still same issue on my side: https://github.com/coleam00/Archon/issues/716#issuecomment-3393457130
I am happy to report that with the newest version installed from today on a new server it scrapes https://developers.3commas.io/ absolutely perfectly! Awesome!
However the MCP doesn't work now on Dokploy due to bug #779 .
@DmacMcgreg Could you update and check that you have enabled an openAI apikey under API Keys and RAG Settings and that all Code Extraction settings are on?
This didn't work earlier for me earlier but in the newest version it does.