leapfrogai ADR: RAG Refinement Approaches

Things that affect RAG performance:

Text Splitting
...

### Tasks

Mar 18 '24 16:03 gphorvath

Techniques

LlamaIndex retrievers and QueryEngines
- I'd like to draw specific attention to the following engines/retrievers:
  - Sub Question Query Engine as a way to ensure that the question is adequately answered. If feasible perform sub-queries to some depth in-order to extend the quality further.
  - Ensemble Query Engine to combine the results of many techniques together, covering potential weaknesses of one technique with another. When used in conjunction with an additional screening layer, better results may be achieved.
  - BM25 Retriever sometimes the best results are a result of just a basic keyword search. Probably worth always doing unless there's a performance reason not to.
LlamaIndex node processors and rerankers

Considerations

Communicating the level of information available in the context, this isn't exhaustive but includes some outlier situations that need to be expected:
- When the context returns no results that are relevant to the users request, how should the LLM reply?
- When the context returns results that may only partially answer the users request, how should the LLM reply?
- When the context returns multiple results that answer the users query but there's conflicting information, how should the LLM reply?
- Distinguish between results that are related to the user's query and those that resolve the user's query
Displaying the information sources. These should be returned by pretty much any RAG solution as long as the filenames/page/chunk info is stored in metadata and isn't mangled.
Screening responses that come back from the RAG before they're sent to the LLM to be shown to the user. With the aim of eliminating irrelevant results (like referring to a document that has nothing to do with the query) or identifying when there's conflicting information. Query Pipelines and the Multi-Step Query Engine may be useful here.
Determine the confidence in results relevance to query, this could be returned to the user to prod them to look into something more themselves.
- This could be provided by the LLM via a natural language response or numerical value if using a structured generation tool
- Rerankers may assign relevance scores that can also be reused for this purpose.
Handle acronyms showing up the in text and resolving them to items that are actually related to the domain
- Domain specific acronyms are unlikely to neatly work with a vector db out of the box.
How should documents of different types (pdf, excel, sql, etc...) be parsed/chunked to provide the best results. Should we do anything beyond using the document loaders provided by langchain/llamaindex?
Create detailed prompts to handle the different states (like the ones described above) that the system can be in and use/transition between them

Hyperparameter Tuning

top k, chunk size, overlap size, etc.. determining the best values for all of our params using a service or creating a framework ourselves

Verfication & Hallucinations

For domain specific information, RAG does this, for all other information it does not.
- A simple solution where additional context is retrieved based on nouns (use NLP libraries like spaCY or NLTK) or acronyms that show up in the text.
  - This can be expanded out to n-depth if necessary. Where information is looked up about the results of the results.
- Knowledge Graphs of information sources like wikipedia and the like can be good for this
- RAG could also be used for this, if there's a separate collection for general knowledge
Knowledge Graphs, for verification
- Tools for Knowledge Graphs and Construction
  - https://www.diffbot.com/products/knowledge-graph/
  - https://neptune.ai/blog/web-scraping-and-knowledge-graphs-machine-learning
  - Langchain/llamaindex Integration
    - https://blog.langchain.dev/enhancing-rag-based-applications-accuracy-by-constructing-and-leveraging-knowledge-graphs/
    - https://docs.llamaindex.ai/en/stable/examples/index_structs/knowledge_graph/KnowledgeGraphDemo/
    - https://python.langchain.com/docs/modules/memory/types/kg
    - https://bratanic-tomaz.medium.com/constructing-knowledge-graphs-from-text-using-openai-functions-096a6d010c17
  - https://pypi.org/project/kgdata/
  - https://rdflib.readthedocs.io/en/stable/
  - https://github.com/dylanhogg/llmgraph
    - https://pypi.org/project/llmgraph/
- Knowledge Base Data
  - https://huggingface.co/datasets/wikipedia
  - https://kbpedia.org/
    - https://kbpedia.org/resources/downloads/
    - http://sparql.kbpedia.org/
  - https://www.kaggle.com/datasets/kenshoresearch/kensho-derived-wikimedia-data
  - https://www.wikidata.org/wiki/Wikidata:Database_download
    - https://www.mediawiki.org/wiki/Wikibase/DataModel/Primer
  - https://www.kaggle.com/datasets/therohk/cyc-kb
  - http://dev.dbpedia.org/Download_DBpedia
  - More
    - https://huggingface.co/datasets
    - https://www.kaggle.com/datasets
    - https://datasetsearch.research.google.com/
- GraphDBs
  - Examples
    - Postgres (extension)
      - https://age.apache.org/
        
        https://github.com/apache/age
      - https://www.dylanpaulus.com/posts/postgres-is-a-graph-database/
    - neo4j
      - https://neo4j.com/
      - https://github.com/neo4j/neo4j
  - More comprehensive list
    - https://github.com/jbmusso/awesome-graph

Embeddings

https://huggingface.co/spaces/mteb/leaderboard

Text Splitting & Chunking

https://github.com/FullStackRetrieval-com/RetrievalTutorials/blob/main/tutorials/LevelsOfTextSplitting/5_Levels_Of_Text_Splitting.ipynb
https://github.com/FullStackRetrieval-com/RetrievalTutorials/blob/main/tutorials/LevelsOfTextSplitting/agentic_chunker.py
Visualize chunking techniques: https://chunkviz.up.railway.app/
https://docs.llamaindex.ai/en/stable/examples/node_parsers/semantic_chunking/
https://python.langchain.com/docs/modules/data_connection/document_transformers/
https://towardsdatascience.com/how-to-chunk-text-data-a-comparative-analysis-3858c4a0997a
https://www.pinecone.io/learn/chunking-strategies/

Ingestion - How to go about ingesting different file formats

llamahub via llamaindex
langchain document loaders

RAG Frameworks

https://github.com/SciPhi-AI/R2R/tree/main
https://embedchain.ai/

Existing RAG Solutions

Using Supabase
- https://supabase.com/docs/guides/ai/rag-with-permissions
- https://supabase.com/docs/guides/ai/langchain?database-method=sql
- https://supabase.com/docs/guides/ai/integrations/llamaindex
- https://www.neum.ai/
  - https://www.neum.ai/post/real-time-data-embedding-and-indexing-for-rag-with-neum-and-supabase
  - https://medium.com/@neum_ai/building-scalable-rag-pipelines-with-neum-ai-framework-part-1-859837786977
  - https://github.com/NeumTry/NeumAI
  - https://medium.com/@neum_ai/retrieval-augmented-generation-at-scale-building-a-distributed-system-for-synchronizing-and-eaa29162521
- https://github.com/supabase-community/chatgpt-your-files
- https://github.com/supabase-community/deno-fresh-openai-doc-search
- https://github.com/different-ai/embedbase
  - https://github.com/different-ai/embedbase/tree/main/supabase
- https://github.com/QuivrHQ/quivr
  - https://github.com/QuivrHQ/quivr/tree/65c0ed505e4ce23b09c9f2becc20fe4797bbd219?tab=readme-ov-file#60-seconds-installation-
  - https://github.com/QuivrHQ/quivr/tree/65c0ed505e4ce23b09c9f2becc20fe4797bbd219/supabase
- https://github.com/gannonh/chatgpt-pgvector
Other Solutions
- https://github.com/weaviate/Verba
- https://github.com/RafayKhattak/LlamaDoc
- https://github.com/tensorlakeai/indexify
- https://github.com/cpacker/MemGPT?tab=readme-ov-file
- https://github.com/zilliztech/GPTCache
- https://github.com/deepset-ai/haystack
- https://github.com/neuml/txtai
- https://github.com/junruxiong/IncarnaMind/

Examples

https://www.sbert.net/examples/applications/semantic-search/README.html#semantic-search
https://www.sbert.net/examples/applications/retrieve_rerank/README.html#retrieve-re-rank

Docs/Research

https://ai.meta.com/blog/retrieval-augmented-generation-streamlining-the-creation-of-intelligent-natural-language-processing-models/
https://docs.arize.com/phoenix/retrieval/concepts-retrieval/benchmarking-retrieval-rag
https://www.solita.fi/blogs/building-robust-language-models-with-rag-one-pitfall-at-a-time/

Mar 29 '24 22:03 CollectiveUnicorn

The llama-index guide just uses Vecs under the hood: https://github.com/supabase/vecs

We probably need to use something a little more robust than Vecs.

Apr 02 '24 00:04 gphorvath

PGVector Performance

https://jkatz05.com/post/postgres/pgvector-performance-150x-speedup/
https://jkatz05.com/post/postgres/pgvector-scalar-binary-quantization/

Aug 26 '24 20:08 CollectiveUnicorn