Archon icon indicating copy to clipboard operation
Archon copied to clipboard

Should my crawled_pages and code_examples tables be empty?

Open streeyt opened this issue 4 months ago • 23 comments

I've just added some knowledge to my Archon instance by crawling the docs sites of Convex and Next.JS. Everything seemed to process ok, pages, code examples etc. and both crawls completed without issue.

However in my Supabase project both the crawled_pages and code_examples tables are empty. The sources table shows entries for Convex and Next.js and I see things in Settings and so on.

Is this expected?

Image

streeyt avatar Aug 20 '25 16:08 streeyt

This is not expected, have you refreshed the supabase page?

Also can you check if you have any errors in the Archon-Server logs in Docker?

Wirasm avatar Aug 20 '25 16:08 Wirasm

I have the same issue. Some people said it is a about not using an embedding model like gemini embedding 001 but I have no idea how to set that up. Which provider did you use for sracping?

mazemaster9 avatar Aug 20 '25 17:08 mazemaster9

@streeyt

Are you using the database instance from supabase.com or a local instance?

@Wirasm I think this is related to #302.

The 'code examples' (from my testing) only work if you have an LLM API key defined in the 'Settings' page and you are using a supabase.com instance. If you are using a local supabase instance, the storing of the knowledge and code examples does not work, as described in #302.

Dandaman42 avatar Aug 20 '25 17:08 Dandaman42

@Dandaman42

Yes, I refreshed the Supabase page numerous times.

I'm using the hosted (Supabase.com) Supabase, not local.

I have OpenAI AND Gemini API Keys configured in Archon. RAG is using gemini-2.5-flash with the gemini-embedding-001 embedding model. tbh I didn't have those setup for the first two crawls I did, and the issue with the empty tables was apparent. I now have the RAG settings configured and the tables are still empty with subsequent crawls.

The Archon UI shows lots of crawled pages and there were plenty of code examples extracted, especially from Next.js, but still nothing in what seem to be the relevant tables @ Supabase.

Image

streeyt avatar Aug 20 '25 17:08 streeyt

try changing your embedding dimension in the .env to 3072 as thats the standard for gemini-embedding-001

  1. go to .env
  2. find EMBEDDING_DIMENSIONS=1536
  3. change the value from 1536 to 3072

Let me know i that helps

Wirasm avatar Aug 20 '25 17:08 Wirasm

try changing your embedding dimension in the .env to 3072 as thats the standard for gemini-embedding-001

  1. go to .env
  2. find EMBEDDING_DIMENSIONS=1536
  3. change the value from 1536 to 3072

Let me know i that helps

Thanks but sadly that didn't help. I made the edit, restarted the Docker images and re-crawled a couple of my Knowledge entries. Everything appears to be working ok in the UI, but I never get anything written to the relevant db tables in Supabase.

Image

streeyt avatar Aug 20 '25 21:08 streeyt

It seems that there is an issue with the dimensions most likely, i will look into it

Wirasm avatar Aug 21 '25 08:08 Wirasm

Probably not relevant, but I meant to mention, this is happening on two separate Archon installs for me, one on my desktop Mac and one on my MacBook. Same config and db

streeyt avatar Aug 21 '25 08:08 streeyt

I dont know if this issue is the same as mine, however, working from a sub folder under Archon helped relieve all my rag issues. I have win 11, claude code via win (not wsl).

day-trading-oracle avatar Aug 21 '25 11:08 day-trading-oracle

had the same issue, turns out i chose the wrong embedding model for gemini (i forgot to change it from default to gemini-embedding-001)

rennyS avatar Aug 21 '25 18:08 rennyS

That's strange. I'm using the Gemini Embedding model the same as you have and I get nothing written to the dB tables.

streeyt avatar Aug 21 '25 18:08 streeyt

Image

rennyS avatar Aug 21 '25 19:08 rennyS

How even yo can change models. I copy the api key from aistudio and paste it in the ui. That one api key is for all models. How and where you choose models. There is only a static text entry area in the ui and i guess its for only naming. There is no dropdown menu to choose a model

21 Ağu 2025 Per 22:02 tarihinde Laurence @.***> şunu yazdı:

rennyS left a comment (coleam00/Archon#388) https://github.com/coleam00/Archon/issues/388#issuecomment-3211755442 9D985ED5-2743-466A-98AE-D6F4CA8A4804.png (view on web) https://github.com/user-attachments/assets/0d49f6eb-cb7f-4ed3-9927-b26a8ebf896f

— Reply to this email directly, view it on GitHub https://github.com/coleam00/Archon/issues/388#issuecomment-3211755442, or unsubscribe https://github.com/notifications/unsubscribe-auth/BKTPCAJZBFLPP3L4WBRL3WL3OYJWFAVCNFSM6AAAAACEL73OLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEMJRG42TKNBUGI . You are receiving this because you commented.Message ID: @.***>

mazemaster9 avatar Aug 21 '25 19:08 mazemaster9

How even yo can change models. I copy the api key from aistudio and paste it in the ui. That one api key is for all models. How and where you choose models. There is only a static text entry area in the ui and i guess its for only naming. There is no dropdown menu to choose a model

21 Ağu 2025 Per 22:02 tarihinde Laurence @.***> şunu yazdı:

rennyS left a comment (coleam00/Archon#388) <#388 (comment)> 9D985ED5-2743-466A-98AE-D6F4CA8A4804.png (view on web) https://github.com/user-attachments/assets/0d49f6eb-cb7f-4ed3-9927-b26a8ebf896f

— Reply to this email directly, view it on GitHub <#388 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/BKTPCAJZBFLPP3L4WBRL3WL3OYJWFAVCNFSM6AAAAACEL73OLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEMJRG42TKNBUGI . You are receiving this because you commented.Message ID: @.***>

you have to manually enter the model name

rennyS avatar Aug 21 '25 19:08 rennyS

Image

I think I have the same settings that you have. The only difference I can see is that I checked "Use contextual embeddings" - could that be related?

Image

streeyt avatar Aug 21 '25 19:08 streeyt

I am having a similar problem. Just from a basic user experience perspective, not having this as a drop-down with known compatible configurations was a bit of an issue on initial setup.

Aston77 avatar Aug 21 '25 20:08 Aston77

Yes, that's it! After reinstalling archon and changing the model to gemini-embedding-001 its working for me now.

streeyt @.***>, 21 Ağu 2025 Per, 22:46 tarihinde şunu yazdı:

streeyt left a comment (coleam00/Archon#388) https://github.com/coleam00/Archon/issues/388#issuecomment-3211869170

[image: Image] https://private-user-images.githubusercontent.com/8315434/480663083-0d49f6eb-cb7f-4ed3-9927-b26a8ebf896f.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NTU4MDU3NTAsIm5iZiI6MTc1NTgwNTQ1MCwicGF0aCI6Ii84MzE1NDM0LzQ4MDY2MzA4My0wZDQ5ZjZlYi1jYjdmLTRlZDMtOTkyNy1iMjZhOGViZjg5NmYucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDgyMSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTA4MjFUMTk0NDEwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MjNhMmFmN2NkODBjZGNhMmJkMjk4YzVmOTc5MjZlOTdlMzliZGM3YzBiMzRjNTlmNTlkYjQ2NGRlMjk4OWIyZiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.XpM6XWVRLvZC3vZAo3Nf7glYOiMnVfYpZ4OVBaNvVaU

I think I have the same settings that you have. The only difference I can see is that I checked "Use contextual embeddings" - could that be related? Screenshot.2025-08-21.at.20.43.35.png (view on web) https://github.com/user-attachments/assets/bb1c5e68-4cdf-44df-abd5-64784416e2b6

— Reply to this email directly, view it on GitHub https://github.com/coleam00/Archon/issues/388#issuecomment-3211869170, or unsubscribe https://github.com/notifications/unsubscribe-auth/BKTPCAJWRRARMDCEW2XW6AT3OYOYNAVCNFSM6AAAAACEL73OLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEMJRHA3DSMJXGA . You are receiving this because you commented.Message ID: @.***>

mazemaster9 avatar Aug 21 '25 21:08 mazemaster9

Do you have "Use contextual embeddings" checked?

Yes, that's it! After reinstalling archon and changing the model to gemini-embedding-001 its working for me now.

streeyt @.***>, 21 Ağu 2025 Per, 22:46 tarihinde şunu yazdı:

streeyt left a comment (coleam00/Archon#388) https://github.com/coleam00/Archon/issues/388#issuecomment-3211869170

[image: Image] https://private-user-images.githubusercontent.com/8315434/480663083-0d49f6eb-cb7f-4ed3-9927-b26a8ebf896f.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NTU4MDU3NTAsIm5iZiI6MTc1NTgwNTQ1MCwicGF0aCI6Ii84MzE1NDM0LzQ4MDY2MzA4My0wZDQ5ZjZlYi1jYjdmLTRlZDMtOTkyNy1iMjZhOGViZjg5NmYucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDgyMSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTA4MjFUMTk0NDEwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MjNhMmFmN2NkODBjZGNhMmJkMjk4YzVmOTc5MjZlOTdlMzliZGM3YzBiMzRjNTlmNTlkYjQ2NGRlMjk4OWIyZiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.XpM6XWVRLvZC3vZAo3Nf7glYOiMnVfYpZ4OVBaNvVaU

I think I have the same settings that you have. The only difference I can see is that I checked "Use contextual embeddings" - could that be related? Screenshot.2025-08-21.at.20.43.35.png (view on web) https://github.com/user-attachments/assets/bb1c5e68-4cdf-44df-abd5-64784416e2b6

— Reply to this email directly, view it on GitHub https://github.com/coleam00/Archon/issues/388#issuecomment-3211869170, or unsubscribe https://github.com/notifications/unsubscribe-auth/BKTPCAJWRRARMDCEW2XW6AT3OYOYNAVCNFSM6AAAAACEL73OLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEMJRHA3DSMJXGA . You are receiving this because you commented.Message ID: @.***>

streeyt avatar Aug 21 '25 21:08 streeyt

I am having a similar problem. Just from a basic user experience perspective, not having this as a drop-down with known compatible configurations was a bit of an issue on initial setup.

I had 'text-embedding-001' instead of 'gemini-embedding-001'

This worked

Image

Aston77 avatar Aug 21 '25 21:08 Aston77

I am facing the same issue

Crofter777 avatar Aug 22 '25 15:08 Crofter777

What I have noticed is that it will silently fail if you exceed the Gemini free tier. It crawls the webpage, but it won't notify you if the embeddings aren't being created.

In some ways, it would actually be better as two separate processes...

  1. Crawl and allow for review of what was crawled at a greater depth so that you could have some insight into what depth level was needed for your purposes.

  2. Then allow embeddings to be created at the depth.

As it currently stands, there really isn't a great deal of visibility into what you are embedding, outside of reviewing the table entries which itself isn't a user-friendly experience.

Aston77 avatar Aug 23 '25 18:08 Aston77

good point here @Aston77

Wirasm avatar Aug 27 '25 18:08 Wirasm

don't want to duplicate so dropping a link. there is also the migrations/db tables that expect embedding size

EDIT: quoting link

The .env has the environment variable EMBEDDING_DIMENSIONS but the migration/complete_setup.sql creates tables expecting the 1536 size. Using the migration/RESET_DB.sql and manually editing the migration/complete_setup.sql references to 1536 to the same value i set in EMBEDDING_DIMENSIONS has shown success on knowledge crawl (tested with/without "Use Contextual Embeddings", no further testing)

theProf avatar Sep 04 '25 15:09 theProf