leapfrogai icon indicating copy to clipboard operation
leapfrogai copied to clipboard

ADR: RAG Refinement Approaches

Open gphorvath opened this issue 2 years ago • 3 comments

Things that affect RAG performance:

  • Text Splitting
  • ...
### Tasks

gphorvath avatar Mar 18 '24 16:03 gphorvath

Techniques

  • LlamaIndex retrievers and QueryEngines
    • I'd like to draw specific attention to the following engines/retrievers:
      • Sub Question Query Engine as a way to ensure that the question is adequately answered. If feasible perform sub-queries to some depth in-order to extend the quality further.
      • Ensemble Query Engine to combine the results of many techniques together, covering potential weaknesses of one technique with another. When used in conjunction with an additional screening layer, better results may be achieved.
      • BM25 Retriever sometimes the best results are a result of just a basic keyword search. Probably worth always doing unless there's a performance reason not to.
  • LlamaIndex node processors and rerankers

Considerations

  • Communicating the level of information available in the context, this isn't exhaustive but includes some outlier situations that need to be expected:
    • When the context returns no results that are relevant to the users request, how should the LLM reply?
    • When the context returns results that may only partially answer the users request, how should the LLM reply?
    • When the context returns multiple results that answer the users query but there's conflicting information, how should the LLM reply?
    • Distinguish between results that are related to the user's query and those that resolve the user's query
  • Displaying the information sources. These should be returned by pretty much any RAG solution as long as the filenames/page/chunk info is stored in metadata and isn't mangled.
  • Screening responses that come back from the RAG before they're sent to the LLM to be shown to the user. With the aim of eliminating irrelevant results (like referring to a document that has nothing to do with the query) or identifying when there's conflicting information. Query Pipelines and the Multi-Step Query Engine may be useful here.
  • Determine the confidence in results relevance to query, this could be returned to the user to prod them to look into something more themselves.
    • This could be provided by the LLM via a natural language response or numerical value if using a structured generation tool
    • Rerankers may assign relevance scores that can also be reused for this purpose.
  • Handle acronyms showing up the in text and resolving them to items that are actually related to the domain
    • Domain specific acronyms are unlikely to neatly work with a vector db out of the box.
  • How should documents of different types (pdf, excel, sql, etc...) be parsed/chunked to provide the best results. Should we do anything beyond using the document loaders provided by langchain/llamaindex?
  • Create detailed prompts to handle the different states (like the ones described above) that the system can be in and use/transition between them

Hyperparameter Tuning

  • top k, chunk size, overlap size, etc.. determining the best values for all of our params using a service or creating a framework ourselves

Verfication & Hallucinations

  • For domain specific information, RAG does this, for all other information it does not.
    • A simple solution where additional context is retrieved based on nouns (use NLP libraries like spaCY or NLTK) or acronyms that show up in the text.
      • This can be expanded out to n-depth if necessary. Where information is looked up about the results of the results.
    • Knowledge Graphs of information sources like wikipedia and the like can be good for this
    • RAG could also be used for this, if there's a separate collection for general knowledge
  • Knowledge Graphs, for verification
    • Tools for Knowledge Graphs and Construction
      • https://www.diffbot.com/products/knowledge-graph/
      • https://neptune.ai/blog/web-scraping-and-knowledge-graphs-machine-learning
      • Langchain/llamaindex Integration
        • https://blog.langchain.dev/enhancing-rag-based-applications-accuracy-by-constructing-and-leveraging-knowledge-graphs/
        • https://docs.llamaindex.ai/en/stable/examples/index_structs/knowledge_graph/KnowledgeGraphDemo/
        • https://python.langchain.com/docs/modules/memory/types/kg
        • https://bratanic-tomaz.medium.com/constructing-knowledge-graphs-from-text-using-openai-functions-096a6d010c17
      • https://pypi.org/project/kgdata/
      • https://rdflib.readthedocs.io/en/stable/
      • https://github.com/dylanhogg/llmgraph
        • https://pypi.org/project/llmgraph/
    • Knowledge Base Data
      • https://huggingface.co/datasets/wikipedia
      • https://kbpedia.org/
        • https://kbpedia.org/resources/downloads/
        • http://sparql.kbpedia.org/
      • https://www.kaggle.com/datasets/kenshoresearch/kensho-derived-wikimedia-data
      • https://www.wikidata.org/wiki/Wikidata:Database_download
        • https://www.mediawiki.org/wiki/Wikibase/DataModel/Primer
      • https://www.kaggle.com/datasets/therohk/cyc-kb
      • http://dev.dbpedia.org/Download_DBpedia
      • More
        • https://huggingface.co/datasets
        • https://www.kaggle.com/datasets
        • https://datasetsearch.research.google.com/
    • GraphDBs
      • Examples
        • Postgres (extension)
          • https://age.apache.org/
            • https://github.com/apache/age
          • https://www.dylanpaulus.com/posts/postgres-is-a-graph-database/
        • neo4j
          • https://neo4j.com/
          • https://github.com/neo4j/neo4j
      • More comprehensive list
        • https://github.com/jbmusso/awesome-graph

Embeddings

  • https://huggingface.co/spaces/mteb/leaderboard

Text Splitting & Chunking

  • https://github.com/FullStackRetrieval-com/RetrievalTutorials/blob/main/tutorials/LevelsOfTextSplitting/5_Levels_Of_Text_Splitting.ipynb
  • https://github.com/FullStackRetrieval-com/RetrievalTutorials/blob/main/tutorials/LevelsOfTextSplitting/agentic_chunker.py
  • Visualize chunking techniques: https://chunkviz.up.railway.app/
  • https://docs.llamaindex.ai/en/stable/examples/node_parsers/semantic_chunking/
  • https://python.langchain.com/docs/modules/data_connection/document_transformers/
  • https://towardsdatascience.com/how-to-chunk-text-data-a-comparative-analysis-3858c4a0997a
  • https://www.pinecone.io/learn/chunking-strategies/

Ingestion - How to go about ingesting different file formats

RAG Frameworks

  • https://github.com/SciPhi-AI/R2R/tree/main
  • https://embedchain.ai/

Existing RAG Solutions

  • Using Supabase
    • https://supabase.com/docs/guides/ai/rag-with-permissions
    • https://supabase.com/docs/guides/ai/langchain?database-method=sql
    • https://supabase.com/docs/guides/ai/integrations/llamaindex
    • https://www.neum.ai/
      • https://www.neum.ai/post/real-time-data-embedding-and-indexing-for-rag-with-neum-and-supabase
      • https://medium.com/@neum_ai/building-scalable-rag-pipelines-with-neum-ai-framework-part-1-859837786977
      • https://github.com/NeumTry/NeumAI
      • https://medium.com/@neum_ai/retrieval-augmented-generation-at-scale-building-a-distributed-system-for-synchronizing-and-eaa29162521
    • https://github.com/supabase-community/chatgpt-your-files
    • https://github.com/supabase-community/deno-fresh-openai-doc-search
    • https://github.com/different-ai/embedbase
      • https://github.com/different-ai/embedbase/tree/main/supabase
    • https://github.com/QuivrHQ/quivr
      • https://github.com/QuivrHQ/quivr/tree/65c0ed505e4ce23b09c9f2becc20fe4797bbd219?tab=readme-ov-file#60-seconds-installation-
      • https://github.com/QuivrHQ/quivr/tree/65c0ed505e4ce23b09c9f2becc20fe4797bbd219/supabase
    • https://github.com/gannonh/chatgpt-pgvector
  • Other Solutions
    • https://github.com/weaviate/Verba
    • https://github.com/RafayKhattak/LlamaDoc
    • https://github.com/tensorlakeai/indexify
    • https://github.com/cpacker/MemGPT?tab=readme-ov-file
    • https://github.com/zilliztech/GPTCache
    • https://github.com/deepset-ai/haystack
    • https://github.com/neuml/txtai
    • https://github.com/junruxiong/IncarnaMind/

Examples

  • https://www.sbert.net/examples/applications/semantic-search/README.html#semantic-search
  • https://www.sbert.net/examples/applications/retrieve_rerank/README.html#retrieve-re-rank

Docs/Research

  • https://ai.meta.com/blog/retrieval-augmented-generation-streamlining-the-creation-of-intelligent-natural-language-processing-models/
  • https://docs.arize.com/phoenix/retrieval/concepts-retrieval/benchmarking-retrieval-rag
  • https://www.solita.fi/blogs/building-robust-language-models-with-rag-one-pitfall-at-a-time/

CollectiveUnicorn avatar Mar 29 '24 22:03 CollectiveUnicorn

The llama-index guide just uses Vecs under the hood: https://github.com/supabase/vecs

We probably need to use something a little more robust than Vecs.

gphorvath avatar Apr 02 '24 00:04 gphorvath

PGVector Performance

  • https://jkatz05.com/post/postgres/pgvector-performance-150x-speedup/
  • https://jkatz05.com/post/postgres/pgvector-scalar-binary-quantization/

CollectiveUnicorn avatar Aug 26 '24 20:08 CollectiveUnicorn