aider icon indicating copy to clipboard operation
aider copied to clipboard

Feature RAG with crawl4ai/vectordb

Open lvalics opened this issue 10 months ago • 3 comments

Issue

Dynamic Documentation with RAG and Crawl4AI This feature is perfect for scenarios where we need up-to-date documentation, but the LLMs in use do not yet have the required information indexed. In such cases, simply using the /web command to fetch a single page is insufficient.

Proposed Workflow: Installing crawl4ai as an additional module This enables fetching documentation directly from the internet by specifying either the sitemap.xml file or a URL pointing to the desired sitemap. This way, we can bring in the entire set of relevant pages for the current project.

Indexing data into a local vector database The extracted data is injected into a vector database (e.g., local, Qdrant, or Supabase) and tagged with the current project's identifier. This step ensures quick and accurate searches in the future.

Querying the database with /rag When using the /rag command, Aider performs a query on the vector database before sending the request to the LLM. The relevant information is extracted and sent along with the prompt to the LLM, ensuring precise and well-informed responses.

Benefits: On-demand documentation: Instant access to information from sources relevant to the current project. Seamless integration: The workflow is simple, scalable, and adaptable to complex projects. Increased efficiency: The LLM benefits from additional context, reducing uncertainty and the time needed for accurate responses.

Source of ideea: https://github.com/coleam00/ottomator-agents/tree/main/crawl4AI-agent

Version and model info

No response

lvalics avatar Jan 13 '25 10:01 lvalics

Also, I would like to see perplexity added to the workflow: https://sonar.perplexity.ai/

https://github.com/Aider-AI/aider/issues/3053

michabbb avatar Jan 29 '25 02:01 michabbb

Issue

Dynamic Documentation with RAG and Crawl4AI This feature is perfect for scenarios where we need up-to-date documentation, but the LLMs in use do not yet have the required information indexed. In such cases, simply using the /web command to fetch a single page is insufficient.

Proposed Workflow: Installing crawl4ai as an additional module This enables fetching documentation directly from the internet by specifying either the sitemap.xml file or a URL pointing to the desired sitemap. This way, we can bring in the entire set of relevant pages for the current project.

Indexing data into a local vector database The extracted data is injected into a vector database (e.g., local, Qdrant, or Supabase) and tagged with the current project's identifier. This step ensures quick and accurate searches in the future.

Querying the database with /rag When using the /rag command, Aider performs a query on the vector database before sending the request to the LLM. The relevant information is extracted and sent along with the prompt to the LLM, ensuring precise and well-informed responses.

Benefits: On-demand documentation: Instant access to information from sources relevant to the current project. Seamless integration: The workflow is simple, scalable, and adaptable to complex projects. Increased efficiency: The LLM benefits from additional context, reducing uncertainty and the time needed for accurate responses.

Source of ideea: https://github.com/coleam00/ottomator-agents/tree/main/crawl4AI-agent

Version and model info

No response

What happens now when you do /web <url> ? Does it send the URL to Anthropic for example? Isn't that the same thing?

andupotorac avatar Apr 25 '25 22:04 andupotorac

there is an amazing project called aider-desk (that uses aider under the hood) and has MCP integrated, so you can use whatever you want, it´s amazing and I can highly recommend it ❤ with that you can add context7 (for example) and you have access to the docs of ~5k libraries/frameworks 🚀

michabbb avatar Apr 25 '25 22:04 michabbb