Archon icon indicating copy to clipboard operation
Archon copied to clipboard

🐛 [Bug]: Knowledge Base Looses Track of Active Crawl

Open phill-bramble opened this issue 3 months ago • 2 comments

Archon Version

Main branch - pulled 5th Sept

Bug Severity

🟢 Low - Minor inconvenience

Bug Description

I started a recrawl following an update. The crawl status displayed and reached the document store phase, however after deleting a few others, the crawl progress disappeared. Inspecting the server logs shows the crawl is still in progress but the status is no longer reported to the UI.

Steps to Reproduce

As per description

Expected Behavior

The crawl status is always displayed for a crawl in progress

Actual Behavior

The crawl continues in the background.

Error Details (if any)

These errors may be unrelated, but I thought I would provide them anyway as it seems odd to have so many failures:


2025-09-05 09:31:47 | src.server.services.llm_provider_service | INFO | Creating LLM client for provider: openai

2025-09-05 09:31:47 | src.server.services.llm_provider_service | INFO | OpenAI client created successfully

2025-09-05 09:32:02 | search | INFO | Batch 115: Generated 1/25 contextual embeddings using batch API (sub-batch size: 50)

2025-09-05 09:32:02 | src.server.services.llm_provider_service | INFO | Creating LLM client for provider: openai

2025-09-05 09:32:02 | src.server.services.llm_provider_service | INFO | OpenAI client created successfully

2025-09-05 09:32:03 | search | WARNING | Error inserting batch (attempt 1/3): {'message': 'duplicate key value violates unique constraint "archon_crawled_pages_url_chunk_number_key"', 'code': '23505', 'hint': None, 'details': 'Key (url, chunk_number)=(https://nextjs.org/docs/app/api-reference/config/next-config-js/distDir, 1) already exists.'}

2025-09-05 09:32:04 | search | WARNING | Error inserting batch (attempt 2/3): {'message': 'duplicate key value violates unique constraint "archon_crawled_pages_url_chunk_number_key"', 'code': '23505', 'hint': None, 'details': 'Key (url, chunk_number)=(https://nextjs.org/docs/app/api-reference/config/next-config-js/distDir, 1) already exists.'}

2025-09-05 09:32:06 | search | ERROR | Failed to insert batch after 3 attempts: {'message': 'duplicate key value violates unique constraint "archon_crawled_pages_url_chunk_number_key"', 'code': '23505', 'hint': None, 'details': 'Key (url, chunk_number)=(https://nextjs.org/docs/app/api-reference/config/next-config-js/distDir, 1) already exists.'}

2025-09-05 09:32:06 | search | ERROR | Failed individual insert for https://nextjs.org/docs/app/api-reference/config/next-config-js/distDir:⁠ {'message': 'duplicate key value violates unique constraint "archon_crawled_pages_url_chunk_number_key"', 'code': '23505', 'hint': None, 'details': 'Key (url, chunk_number)=(https://nextjs.org/docs/app/api-reference/config/next-config-js/distDir, 1) already exists.'}

2025-09-05 09:32:06 | search | ERROR | Failed individual insert for https://nextjs.org/docs/app/api-reference/config/next-config-js/distDir:⁠ {'message': 'duplicate key value violates unique constraint "archon_crawled_pages_url_chunk_number_key"', 'code': '23505', 'hint': None, 'details': 'Key (url, chunk_number)=(https://nextjs.org/docs/app/api-reference/config/next-config-js/distDir, 2) already exists.'}

2025-09-05 09:32:06 | search | ERROR | Failed individual insert for https://nextjs.org/docs/app/api-reference/config/next-config-js/distDir:⁠ {'message': 'duplicate key value violates unique constraint "archon_crawled_pages_url_chunk_number_key"', 'code': '23505', 'hint': None, 'details': 'Key (url, chunk_number)=(https://nextjs.org/docs/app/api-reference/config/next-config-js/distDir, 3) already exists.'}

2025-09-05 09:32:06 | search | ERROR | Failed individual insert for https://nextjs.org/docs/app/api-reference/config/next-config-js/distDir:⁠ {'message': 'duplicate key value violates unique constraint "archon_crawled_pages_url_chunk_number_key"', 'code': '23505', 'hint': None, 'details': 'Key (url, chunk_number)=(https://nextjs.org/docs/app/api-reference/config/next-config-js/distDir, 4) already exists.'}

2025-09-05 09:32:06 | search | ERROR | Failed individual insert for https://nextjs.org/docs/app/api-reference/config/next-config-js/distDir:⁠ {'message': 'duplicate key value violates unique constraint "archon_crawled_pages_url_chunk_number_key"', 'code': '23505', 'hint': None, 'details': 'Key (url, chunk_number)=(https://nextjs.org/docs/app/api-reference/config/next-config-js/distDir, 5) already exists.'}

2025-09-05 09:32:06 | search | ERROR | Failed individual insert for https://nextjs.org/docs/app/api-reference/config/next-config-js/distDir:⁠ {'message': 'duplicate key value violates unique constraint "archon_crawled_pages_url_chunk_number_key"', 'code': '23505', 'hint': None, 'details': 'Key (url, chunk_number)=(https://nextjs.org/docs/app/api-reference/config/next-config-js/distDir, 6) already exists.'}

2025-09-05 09:32:06 | search | ERROR | Failed individual insert for https://nextjs.org/docs/app/api-reference/config/next-config-js/distDir:⁠ {'message': 'duplicate key value violates unique constraint "archon_crawled_pages_url_chunk_number_key"', 'code': '23505', 'hint': None, 'details': 'Key (url, chunk_number)=(https://nextjs.org/docs/app/api-reference/config/next-config-js/distDir, 7) already exists.'}

2025-09-05 09:32:06 | search | INFO | Individual inserts: 18/25 successful

Affected Component

🔍 Knowledge Base / RAG

Browser & OS

Brave 1.81.137 / Chromium: 139.0.7258.158

Additional Context

No response

Service Status (check all that are working)

  • [x] 🖥️ Frontend UI (http://localhost:3737)
  • [x] ⚙️ Main Server (http://localhost:8181)
  • [x] 🔗 MCP Service (localhost:8051)
  • [x] 🤖 Agents Service (http://localhost:8052)
  • [x] 💾 Supabase Database (connected)

phill-bramble avatar Sep 05 '25 09:09 phill-bramble

Thanks for reporting! So this specifically happens when you are performing other options like deleting knowledge while a crawl is happening? Or literally just when deleting other knowledge?

coleam00 avatar Sep 06 '25 18:09 coleam00

Hi!

So yeah, I started the crawl, and admittedly it's a large crawl of a huge document. But, I navigated away from the page to projects and back, and it was still there. After a while I started doing a cleanup of some other knowledge base entries, and suddenly after a refresh when deleting an entry, the crawl was gone from the UI, but still running in the background. I'm not sure if it's related to the length of time, or the other operations, however I can repeat it provided the crawl or processing is lengthy.

phill-bramble avatar Sep 06 '25 18:09 phill-bramble

I have also had this happen a bunch. I have found that if I set my archon server HOST=127.0.0.1 and only access it one 127.0.0.1 then my large crawls don't disappear. But I Host my archon instance at my office and its ip on the tailscale network to access it. Even when I change the HOST to the ip address on tailscale i can use all other features except adding new documents to knowledge base the documents do get crawled but the information never appears in the front page. I have included my docs as well but i don't see anything sure useful in there.

[COMPLETE] ● https://go.microsoft.com/fwlink/?linkid=2272849&clcid=0x409&culture=en-us&country=us | ✓ | ⏱: 7.01s 2025-09-09 18:52:27 | search | ERROR | Batch 0 failed: Error code: 404 - {'error': {'code': 404, 'message': 'models/text-embedding-3-small is not found for API version v1main, or is not supported for embedContent. Call ListModels to see the list of available models and their supported methods.', 'status': 'NOT_FOUND'}} Traceback (most recent call last): File "/app/src/server/services/embeddings/embedding_service.py", line 224, in create_embeddings_batch response = await client.embeddings.create( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/venv/lib/python3.12/site-packages/openai/resources/embeddings.py", line 243, in create return await self._post( ^^^^^^^^^^^^^^^^^ File "/venv/lib/python3.12/site-packages/openai/_base_client.py", line 1767, in post return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/venv/lib/python3.12/site-packages/openai/_base_client.py", line 1461, in request return await self._request( ^^^^^^^^^^^^^^^^^^^^ File "/venv/lib/python3.12/site-packages/openai/_base_client.py", line 1562, in _request raise self._make_status_error_from_response(err.response) from None openai.NotFoundError: Error code: 404 - {'error': {'code': 404, 'message': 'models/text-embedding-3-small is not found for API version v1main, or is not supported for embedContent. Call ListModels to see the list of available models and their supported methods.', 'status': 'NOT_FOUND'}} 2025-09-09 18:52:27 | search | ERROR | Batch 36: Failed to create 25 embeddings. Successful: 0. Errors: ["Failed to create embedding: Error code: 404 - {'error': {'code': 404, 'message': 'models/text-embedding-3-small is not found for API version v1main, or is not supported for embedContent. Call ListModels to see the list of available models and their supported methods.', 'status': 'NOT_FOUND'}}", "Failed to create embedding: Error code: 404 - {'error': {'code': 404, 'message': 'models/text-embedding-3-small is not found for API version v1main, or is not supported for embedContent. Call ListModels to see the list of available models and their supported methods.', 'status': 'NOT_FOUND'}}", "Failed to create embedding: Error code: 404 - {'error': {'code': 404, 'message': 'models/text-embedding-3-small is not found for API version v1main, or is not supported for embedContent. Call ListModels to see the list of available models and their supported methods.', 'status': 'NOT_FOUND'}}"] 2025-09-09 18:52:27 | search | WARNING | Skipping batch 36 - no successful embeddings created 2025-09-09 18:52:27 | src.server.services.llm_provider_service | INFO | Creating LLM client for provider: google 2025-09-09 18:52:27 | src.server.services.llm_provider_service | INFO | Google Gemini client created successfully [FETCH]... ↓ https://learn.microsoft.com/en-us/azure/azure-web-pubsub/howto-secure-network-access-control | ✓ | ⏱: 5.19s [SCRAPE].. ◆ https://learn.microsoft.com/en-us/azure/azure-web-pubsub/howto-secure-network-access-control | ✓ | ⏱: 0.14s [COMPLETE] ● https://learn.microsoft.com/en-us/azure/azure-web-pubsub/howto-secure-network-access-control | ✓ | ⏱: 5.34s [FETCH]... ↓ https://learn.microsoft.com/en-us/graph/api/atta...omations?view=graph-rest-beta&preserve-view=true | ✓ | ⏱: 3.74s [SCRAPE].. ◆ https://learn.microsoft.com/en-us/graph/api/atta...omations?view=graph-rest-beta&preserve-view=true | ✓ | ⏱: 0.15s [COMPLETE] ● https://learn.microsoft.com/en-us/graph/api/atta...omations?view=graph-rest-beta&preserve-view=true | ✓ | ⏱: 3.90s [FETCH]... ↓ https://learn.microsoft.com/en-us/graph/api/resources/onEmailOtpSendListener?view=graph-rest-beta | ✓ | ⏱: 2.71s [SCRAPE].. ◆ https://learn.microsoft.com/en-us/graph/api/resources/onEmailOtpSendListener?view=graph-rest-beta | ✓ | ⏱: 0.10s [COMPLETE] ● https://learn.microsoft.com/en-us/graph/api/resources/onEmailOtpSendListener?view=graph-rest-beta | ✓ | ⏱: 2.82s [FETCH]... ↓ https://learn.microsoft.com/en-us/graph/api/repo...temailappusageappsusercounts?view=graph-rest-1.0 | ✓ | ⏱: 3.64s [SCRAPE].. ◆ https://learn.microsoft.com/en-us/graph/api/repo...temailappusageappsusercounts?view=graph-rest-1.0 | ✓ | ⏱: 0.14s [COMPLETE] ● https://learn.microsoft.com/en-us/graph/api/repo...temailappusageappsusercounts?view=graph-rest-1.0 | ✓ | ⏱: 3.78s [FETCH]... ↓ https://learn.microsoft.com/en-us/graph/api/resources/workbook?view=graph-rest-beta | ✓ | ⏱: 2.82s [SCRAPE].. ◆ https://learn.microsoft.com/en-us/graph/api/resources/workbook?view=graph-rest-beta | ✓ | ⏱: 0.11s [COMPLETE] ● https://learn.microsoft.com/en-us/graph/api/resources/workbook?view=graph-rest-beta | ✓ | ⏱: 2.94s [FETCH]... ↓ https://learn.microsoft.com/en-us/graph/api/tena...mainname?view=graph-rest-beta&preserve-view=true | ✓ | ⏱: 2.74s [SCRAPE].. ◆ https://learn.microsoft.com/en-us/graph/api/tena...mainname?view=graph-rest-beta&preserve-view=true | ✓ | ⏱: 0.12s [COMPLETE] ● https://learn.microsoft.com/en-us/graph/api/tena...mainname?view=graph-rest-beta&preserve-view=true | ✓ | ⏱: 2.86s [FETCH]... ↓ https://learn.microsoft.com/en-us/graph/templates/?context=graph/api/beta&view=graph-rest-beta/ | ✓ | ⏱: 2.19s [SCRAPE].. ◆ https://learn.microsoft.com/en-us/graph/templates/?context=graph/api/beta&view=graph-rest-beta/ | ✓ | ⏱: 0.06s [COMPLETE] ● https://learn.microsoft.com/en-us/graph/templates/?context=graph/api/beta&view=graph-rest-beta/ | ✓ | ⏱: 2.25s [FETCH]... ↓ https://learn.microsoft.com/en-us/locale?target=...us/graph/sdks/batch-requests?view=graph-rest-1.0 | ✓ | ⏱: 1.92s [SCRAPE].. ◆ https://learn.microsoft.com/en-us/locale?target=...us/graph/sdks/batch-requests?view=graph-rest-1.0 | ✓ | ⏱: 0.06s [COMPLETE] ● https://learn.microsoft.com/en-us/locale?target=...us/graph/sdks/batch-requests?view=graph-rest-1.0 | ✓ | ⏱: 1.98s 2025-09-09 18:52:31 | search | INFO | Batch 37: Generated 1/25 contextual embeddings using batch API (sub-batch size: 50) 2025-09-09 18:52:31 | src.server.services.llm_provider_service | INFO | Creating LLM client for provider: google 2025-09-09 18:52:31 | src.server.services.llm_provider_service | INFO | Google Gemini client created successfully 2025-09-09 18:52:31 | threading | INFO | Rate limiting: waiting 4.1s [FETCH]... ↓ https://learn.microsoft.com/en-us/azure/azure-monitor/app/sampling-classic-api | ✓ | ⏱: 8.30s [SCRAPE].. ◆ https://learn.microsoft.com/en-us/azure/azure-monitor/app/sampling-classic-api | ✓ | ⏱: 0.17s [COMPLETE] ● https://learn.microsoft.com/en-us/azure/azure-monitor/app/sampling-classic-api | ✓ | ⏱: 8.47s [FETCH]... ↓ https://learn.microsoft.com/en-us/graph/api/schedulechangerequest-approveForUser | ✓ | ⏱: 4.16s [SCRAPE].. ◆ https://learn.microsoft.com/en-us/graph/api/schedulechangerequest-approveForUser | ✓ | ⏱: 0.14s [COMPLETE] ● https://learn.microsoft.com/en-us/graph/api/schedulechangerequest-approveForUser | ✓ | ⏱: 4.30s [FETCH]... ↓ https://learn.microsoft.com/en-us/openspecs/blog...qlintbloglp/4fac6802-34cf-4f7a-ab52-cb34d48913b3 | ✓ | ⏱: 2.00s [SCRAPE].. ◆ https://learn.microsoft.com/en-us/openspecs/blog...qlintbloglp/4fac6802-34cf-4f7a-ab52-cb34d48913b3 | ✓ | ⏱: 0.08s [COMPLETE] ● https://learn.microsoft.com/en-us/openspecs/blog...qlintbloglp/4fac6802-34cf-4f7a-ab52-cb34d48913b3 | ✓ | ⏱: 2.08s [FETCH]... ↓ https://learn.microsoft.com/en-us/graph/delta-query-events?source=recommendations | ✓ | ⏱: 4.76s [SCRAPE].. ◆ https://learn.microsoft.com/en-us/graph/delta-query-events?source=recommendations | ✓ | ⏱: 0.53s [COMPLETE] ● https://learn.microsoft.com/en-us/graph/delta-query-events?source=recommendations | ✓ | ⏱: 5.30s [FETCH]... ↓ https://learn.microsoft.com/en-us/graph/outlook-get-mime-message?source=recommendations | ✓ | ⏱: 4.39s [SCRAPE].. ◆ https://learn.microsoft.com/en-us/graph/outlook-get-mime-message?source=recommendations | ✓ | ⏱: 0.12s [COMPLETE] ● https://learn.microsoft.com/en-us/graph/outlook-get-mime-message?source=recommendations | ✓ | ⏱: 4.52s [SCROLL].. ℹ Failed to execute scroll: Page.evaluate: Execution context was destroyed, most likely because of a navigation [FETCH]... ↓ https://learn.microsoft.com/en-us/training/modules/data-analytics-microsoft/3-roles | ✓ | ⏱: 3.51s [SCRAPE].. ◆ https://learn.microsoft.com/en-us/training/modules/data-analytics-microsoft/3-roles | ✓ | ⏱: 0.13s [COMPLETE] ● https://learn.microsoft.com/en-us/training/modules/data-analytics-microsoft/3-roles | ✓ | ⏱: 3.64s [FETCH]... ↓ https://learn.microsoft.com/en-us/users/na/?userid=2e81d28d-e734-4cdb-a2a2-b60fbf11d473 | ✓ | ⏱: 2.35s [SCRAPE].. ◆ https://learn.microsoft.com/en-us/users/na/?userid=2e81d28d-e734-4cdb-a2a2-b60fbf11d473 | ✓ | ⏱: 0.06s [COMPLETE] ● https://learn.microsoft.com/en-us/users/na/?userid=2e81d28d-e734-4cdb-a2a2-b60fbf11d473 | ✓ | ⏱: 2.41s [FETCH]... ↓ https://learn.microsoft.com/en-us/users/na/?userid=90ac8cd0-6c23-4d55-b59a-fe7261c9b1a5 | ✓ | ⏱: 2.35s [SCRAPE].. ◆ https://learn.microsoft.com/en-us/users/na/?userid=90ac8cd0-6c23-4d55-b59a-fe7261c9b1a5 | ✓ | ⏱: 0.05s [COMPLETE] ● https://learn.microsoft.com/en-us/users/na/?userid=90ac8cd0-6c23-4d55-b59a-fe7261c9b1a5 | ✓ | ⏱: 2.40s [FETCH]... ↓ https://learn.microsoft.com/en-us/training/modules/explore-event-streams-microsoft-fabric/ | ✓ | ⏱: 3.16s [SCRAPE].. ◆ https://learn.microsoft.com/en-us/training/modules/explore-event-streams-microsoft-fabric/ | ✓ | ⏱: 0.11s [COMPLETE] ● https://learn.microsoft.com/en-us/training/modules/explore-event-streams-microsoft-fabric/ | ✓ | ⏱: 3.27s [FETCH]... ↓ https://learn.microsoft.com/en-us/power-platform/guidance/case-studies/db-empowers-citizen-devs | ✓ | ⏱: 4.93s [SCRAPE].. ◆ https://learn.microsoft.com/en-us/power-platform/guidance/case-studies/db-empowers-citizen-devs | ✓ | ⏱: 0.12s [COMPLETE] ● https://learn.microsoft.com/en-us/power-platform/guidance/case-studies/db-empowers-citizen-devs | ✓ | ⏱: 5.05s [FETCH]... ↓ https://learn.microsoft.com/en-us/training/modul...ns-enrollment-id=learn-m365.sharepoint-associate | ✓ | ⏱: 3.70s [SCRAPE].. ◆ https://learn.microsoft.com/en-us/training/modul...ns-enrollment-id=learn-m365.sharepoint-associate | ✓ | ⏱: 0.15s [COMPLETE] ● https://learn.microsoft.com/en-us/training/modul...ns-enrollment-id=learn-m365.sharepoint-associate | ✓ | ⏱: 3.86s [FETCH]... ↓ https://learn.microsoft.com/en-us/training/modul...rn.wwl.transform-your-business-with-microsoft-ai | ✓ | ⏱: 4.27s [SCRAPE].. ◆ https://learn.microsoft.com/en-us/training/modul...rn.wwl.transform-your-business-with-microsoft-ai | ✓ | ⏱: 0.13s [COMPLETE] ● https://learn.microsoft.com/en-us/training/modul...rn.wwl.transform-your-business-with-microsoft-ai | ✓ | ⏱: 4.40s 2025-09-09 18:52:35 | search | ERROR | Batch 0 failed: Error code: 404 - {'error': {'code': 404, 'message': 'models/text-embedding-3-small is not found for API version v1main, or is not supported for embedContent. Call ListModels to see the list of available models and their supported methods.', 'status': 'NOT_FOUND'}} Traceback (most recent call last): File "/app/src/server/services/embeddings/embedding_service.py", line 224, in create_embeddings_batch response = await client.embeddings.create( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/venv/lib/python3.12/site-packages/openai/resources/embeddings.py", line 243, in create return await self._post( ^^^^^^^^^^^^^^^^^ File "/venv/lib/python3.12/site-packages/openai/_base_client.py", line 1767, in post return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/venv/lib/python3.12/site-packages/openai/_base_client.py", line 1461, in request return await self._request( ^^^^^^^^^^^^^^^^^^^^ File "/venv/lib/python3.12/site-packages/openai/_base_client.py", line 1562, in _request raise self._make_status_error_from_response(err.response) from None openai.NotFoundError: Error code: 404 - {'error': {'code': 404, 'message': 'models/text-embedding-3-small is not found for API version v1main, or is not supported for embedContent. Call ListModels to see the list of available models and their supported methods.', 'status': 'NOT_FOUND'}} 2025-09-09 18:52:35 | search | ERROR | Batch 37: Failed to create 25 embeddings. Successful: 0. Errors: ["Failed to create embedding: Error code: 404 - {'error': {'code': 404, 'message': 'models/text-embedding-3-small is not found for API version v1main, or is not supported for embedContent. Call ListModels to see the list of available models and their supported methods.', 'status': 'NOT_FOUND'}}", "Failed to create embedding: Error code: 404 - {'error': {'code': 404, 'message': 'models/text-embedding-3-small is not found for API version v1main, or is not supported for embedContent. Call ListModels to see the list of available models and their supported methods.', 'status': 'NOT_FOUND'}}", "Failed to create embedding: Error code: 404 - {'error': {'code': 404, 'message': 'models/text-embedding-3-small is not found for API version v1main, or is not supported for embedContent. Call ListModels to see the list of available models and their supported methods.', 'status': 'NOT_FOUND'}}"] 2025-09-09 18:52:35 | search | WARNING | Skipping batch 37 - no successful embeddings created 2025-09-09 18:52:35 | src.server.services.llm_provider_service | INFO | Creating LLM client for provider: google 2025-09-09 18:52:35 | src.server.services.llm_provider_service | INFO | Google Gemini client created successfully [FETCH]... ↓ https://learn.microsoft.com/en-us/windows/mixed-reality/enthusiast-guide/vr-journey | ✓ | ⏱: 3.46s [SCRAPE].. ◆ https://learn.microsoft.com/en-us/windows/mixed-reality/enthusiast-guide/vr-journey | ✓ | ⏱: 0.09s [COMPLETE] ● https://learn.microsoft.com/en-us/windows/mixed-reality/enthusiast-guide/vr-journey | ✓ | ⏱: 3.56s [FETCH]... ↓ https://learn.microsoft.com/fr-be/odata/ | ✓ | ⏱: 2.74s [SCRAPE].. ◆ https://learn.microsoft.com/fr-be/odata/ | ✓ | ⏱: 0.08s [COMPLETE] ● https://learn.microsoft.com/fr-be/odata/ | ✓ | ⏱: 2.82s [FETCH]... ↓ https://learn.microsoft.com/eu-es/graph/api/driveitem-list-children?view=graph-rest-1.0 | ✓ | ⏱: 3.05s [SCRAPE].. ◆ https://learn.microsoft.com/eu-es/graph/api/driveitem-list-children?view=graph-rest-1.0 | ✓ | ⏱: 0.14s [COMPLETE] ● https://learn.microsoft.com/eu-es/graph/api/driveitem-list-children?view=graph-rest-1.0 | ✓ | ⏱: 3.19s [FETCH]... ↓ https://learn.microsoft.com/nb-no/power-platform/ | ✓ | ⏱: 2.99s [SCRAPE].. ◆ https://learn.microsoft.com/nb-no/power-platform/ | ✓ | ⏱: 0.08s [COMPLETE] ● https://learn.microsoft.com/nb-no/power-platform/ | ✓ | ⏱: 3.07s [FETCH]... ↓ https://learn.microsoft.com/es-es/answers/questi...r-m-s-que-pongo-en-outlook-que-mi-idioma-es-el-e | ✓ | ⏱: 4.28s [SCRAPE].. ◆ https://learn.microsoft.com/es-es/answers/questi...r-m-s-que-pongo-en-outlook-que-mi-idioma-es-el-e | ✓ | ⏱: 0.10s [COMPLETE] ● https://learn.microsoft.com/es-es/answers/questi...r-m-s-que-pongo-en-outlook-que-mi-idioma-es-el-e | ✓ | ⏱: 4.38s [FETCH]... ↓ https://learn.microsoft.com/pl-pl/dynamics365/ | ✓ | ⏱: 3.09s [SCRAPE].. ◆ https://learn.microsoft.com/pl-pl/dynamics365/ | ✓ | ⏱: 0.08s [COMPLETE] ● https://learn.microsoft.com/pl-pl/dynamics365/ | ✓ | ⏱: 3.17s [FETCH]... ↓ https://learn.microsoft.com/en-us/dynamics365/de...ence/entities/msdyn_conversationsegmentsentiment | ✓ | ⏱: 13.16s [SCRAPE].. ◆ https://learn.microsoft.com/en-us/dynamics365/de...ence/entities/msdyn_conversationsegmentsentiment | ✓ | ⏱: 0.95s [COMPLETE] ● https://learn.microsoft.com/en-us/dynamics365/de...ence/entities/msdyn_conversationsegmentsentiment | ✓ | ⏱: 14.12s 2025-09-09 18:52:39 | search | INFO | Batch 38: Generated 1/25 contextual embeddings using batch API (sub-batch size: 50) 2025-09-09 18:52:39 | src.server.services.llm_provider_service | INFO | Creating LLM client for provider: google 2025-09-09 18:52:39 | src.server.services.llm_provider_service | INFO | Google Gemini client created successfully 2025-09-09 18:52:39 | threading | INFO | Rate limiting: waiting 1.4s [FETCH]... ↓ https://support.microsoft.com/en-us/account-bill...-conditions-94295a5a-a0bb-070e-1d15-2145344b741d | ✓ | ⏱: 3.99s [SCRAPE].. ◆ https://support.microsoft.com/en-us/account-bill...-conditions-94295a5a-a0bb-070e-1d15-2145344b741d | ✓ | ⏱: 0.11s [COMPLETE] ● https://support.microsoft.com/en-us/account-bill...-conditions-94295a5a-a0bb-070e-1d15-2145344b741d | ✓ | ⏱: 4.11s [FETCH]... ↓ https://www.microsoft.com/microsoft-365/roadmap | ✓ | ⏱: 3.06s [SCRAPE].. ◆ https://www.microsoft.com/microsoft-365/roadmap | ✓ | ⏱: 0.14s [COMPLETE] ● https://www.microsoft.com/microsoft-365/roadmap | ✓ | ⏱: 3.20s 2025-09-09 18:52:40 | search | ERROR | Batch 0 failed: Error code: 404 - {'error': {'code': 404, 'message': 'models/text-embedding-3-small is not found for API version v1main, or is not supported for embedContent. Call ListModels to see the list of available models and their supported methods.', 'status': 'NOT_FOUND'}} Traceback (most recent call last): File "/app/src/server/services/embeddings/embedding_service.py", line 224, in create_embeddings_batch response = await client.embeddings.create( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/venv/lib/python3.12/site-packages/openai/resources/embeddings.py", line 243, in create return await self._post( ^^^^^^^^^^^^^^^^^ File "/venv/lib/python3.12/site-packages/openai/_base_client.py", line 1767, in post return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/venv/lib/python3.12/site-packages/openai/_base_client.py", line 1461, in request return await self._request( ^^^^^^^^^^^^^^^^^^^^ File "/venv/lib/python3.12/site-packages/openai/_base_client.py", line 1562, in _request raise self._make_status_error_from_response(err.response) from None openai.NotFoundError: Error code: 404 - {'error': {'code': 404, 'message': 'models/text-embedding-3-small is not found for API version v1main, or is not supported for embedContent. Call ListModels to see the list of available models and their supported methods.', 'status': 'NOT_FOUND'}} 2025-09-09 18:52:40 | search | ERROR | Batch 38: Failed to create 25 embeddings. Successful: 0. Errors: ["Failed to create embedding: Error code: 404 - {'error': {'code': 404, 'message': 'models/text-embedding-3-small is not found for API version v1main, or is not supported for embedContent. Call ListModels to see the list of available models and their supported methods.', 'status': 'NOT_FOUND'}}", "Failed to create embedding: Error code: 404 - {'error': {'code': 404, 'message': 'models/text-embedding-3-small is not found for API version v1main, or is not supported for embedContent. Call ListModels to see the list of available models and their supported methods.', 'status': 'NOT_FOUND'}}", "Failed to create embedding: Error code: 404 - {'error': {'code': 404, 'message': 'models/text-embedding-3-small is not found for API version v1main, or is not supported for embedContent. Call ListModels to see the list of available models and their supported methods.', 'status': 'NOT_FOUND'}}"] 2025-09-09 18:52:40 | search | WARNING | Skipping batch 38 - no successful embeddings created 2025-09-09 18:52:40 | src.server.services.llm_provider_service | INFO | Creating LLM client for provider: google 2025-09-09 18:52:40 | src.server.services.llm_provider_service | INFO | Google Gemini client created successfully [FETCH]... ↓ https://techcommunity.microsoft.com/blog/windows...docs.microsoft.com/deployedge/edge-ie-mode-faq\" | ✓ | ⏱: 5.22s [SCRAPE].. ◆ https://techcommunity.microsoft.com/blog/windows...docs.microsoft.com/deployedge/edge-ie-mode-faq\" | ✓ | ⏱: 0.00s [COMPLETE] ● https://techcommunity.microsoft.com/blog/windows...docs.microsoft.com/deployedge/edge-ie-mode-faq\" | ✓ | ⏱: 5.23s [FETCH]... ↓ https://www.microsoft.com/surface/business/warranty-protection-plans-and-support?ocid=cmmjcz28if8 | ✓ | ⏱: 4.41s [SCRAPE].. ◆ https://www.microsoft.com/surface/business/warranty-protection-plans-and-support?ocid=cmmjcz28if8 | ✓ | ⏱: 0.17s [COMPLETE] ● https://www.microsoft.com/surface/business/warranty-protection-plans-and-support?ocid=cmmjcz28if8 | ✓ | ⏱: 4.58s [FETCH]... ↓ https://www.microsoft.com/en/customers/story/23921-assembly-software-azure-ai-foundry | ✓ | ⏱: 6.21s [SCRAPE].. ◆ https://www.microsoft.com/en/customers/story/23921-assembly-software-azure-ai-foundry | ✓ | ⏱: 0.17s [COMPLETE] ● https://www.microsoft.com/en/customers/story/23921-assembly-software-azure-ai-foundry | ✓ | ⏱: 6.38s [FETCH]... ↓ https://www.microsoft.com/ja-jp/industry/blog/government/2025/05/12/microsoft-ai-tour2025-gov/ | ✓ | ⏱: 6.33s [SCRAPE].. ◆ https://www.microsoft.com/ja-jp/industry/blog/government/2025/05/12/microsoft-ai-tour2025-gov/ | ✓ | ⏱: 0.09s [COMPLETE] ● https://www.microsoft.com/ja-jp/industry/blog/government/2025/05/12/microsoft-ai-tour2025-gov/ | ✓ | ⏱: 6.43s 2025-09-09 18:52:43 | search | INFO | Batch 39: Generated 1/25 contextual embeddings using batch API (sub-batch size: 50) 2025-09-09 18:52:43 | src.server.services.llm_provider_service | INFO | Creating LLM client for provider: google 2025-09-09 18:52:43 | src.server.services.llm_provider_service | INFO | Google Gemini client created successfully 2025-09-09 18:52:43 | threading | INFO | Rate limiting: waiting 2.3s 2025-09-09 18:52:46 | search | ERROR | Batch 0 failed: Error code: 404 - {'error': {'code': 404, 'message': 'models/text-embedding-3-small is not found for API version v1main, or is not supported for embedContent. Call ListModels to see the list of available models and their supported methods.', 'status': 'NOT_FOUND'}} Traceback (most recent call last): File "/app/src/server/services/embeddings/embedding_service.py", line 224, in create_embeddings_batch response = await client.embeddings.create( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/venv/lib/python3.12/site-packages/openai/resources/embeddings.py", line 243, in create return await self._post( ^^^^^^^^^^^^^^^^^ File "/venv/lib/python3.12/site-packages/openai/_base_client.py", line 1767, in post return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/venv/lib/python3.12/site-packages/openai/_base_client.py", line 1461, in request return await self._request( ^^^^^^^^^^^^^^^^^^^^ File "/venv/lib/python3.12/site-packages/openai/_base_client.py", line 1562, in _request raise self._make_status_error_from_response(err.response) from None openai.NotFoundError: Error code: 404 - {'error': {'code': 404, 'message': 'models/text-embedding-3-small is not found for API version v1main, or is not supported for embedContent. Call ListModels to see the list of available models and their supported methods.', 'status': 'NOT_FOUND'}} 2025-09-09 18:52:46 | search | ERROR | Batch 39: Failed to create 25 embeddings. Successful: 0. Errors: ["Failed to create embedding: Error code: 404 - {'error': {'code': 404, 'message': 'models/text-embedding-3-small is not found for API version v1main, or is not supported for embedContent. Call ListModels to see the list of available models and their supported methods.', 'status': 'NOT_FOUND'}}", "Failed to create embedding: Error code: 404 - {'error': {'code': 404, 'message': 'models/text-embedding-3-small is not found for API version v1main, or is not supported for embedContent. Call ListModels to see the list of available models and their supported methods.', 'status': 'NOT_FOUND'}}", "Failed to create embedding: Error code: 404 - {'error': {'code': 404, 'message': 'models/text-embedding-3-small is not found for API version v1main, or is not supported for embedContent. Call ListModels to see the list of available models and their supported methods.', 'status': 'NOT_FOUND'}}"] 2025-09-09 18:52:46 | search | WARNING | Skipping batch 39 - no successful embeddings created 2025-09-09 18:52:46 | src.server.services.llm_provider_service | INFO | Creating LLM client for provider: google 2025-09-09 18:52:46 | src.server.services.llm_provider_service | INFO | Google Gemini client created successfully killerherts@debianDesktopi5:~$

njhughes-01 avatar Sep 09 '25 18:09 njhughes-01

this should be resolved in #695 please let me know if you still have issues

Wirasm avatar Sep 18 '25 10:09 Wirasm