langchain4j icon indicating copy to clipboard operation
langchain4j copied to clipboard

[FEATURE] Add advanced RAG with Azure AI Search

Open jdubois opened this issue 1 year ago • 2 comments

As I added Azure AI Search vector search support in #530 , and that advanced RAG was merged in #538 , my goal is now to add hybrid search and semantic reranking to LangChain4J.

In the JavaScript version at https://github.com/langchain-ai/langchainjs/blob/main/libs/langchain-community/src/vectorstores/azure_aisearch.ts they basically have the 3 search models in 3 functions next to each other:

  • similaritySearchVectorWithScore is the current vector search we already have
  • hybridSearchVectorWithScore is vector search + text search
  • semanticHybridSearchVectorWithScore is hybrid search + reranking

Question 1: as in those 2 last cases, the parallel searches and then the reranking are done inside Azure AI Search, how should this be implemented in LangChain4J? As far as I understand, this would bypass the current "advanced RAG" model.

Question 2: is it OK to implement this like in JavaScript? What I don't like is that the 2 new search functions won't be part of the EmbeddingStore interface. Or should I inherit from AzureAiSearchEmbeddingStore to create 2 new implementations?

jdubois avatar Jan 31 '24 17:01 jdubois

Update: I'm currently doing an implementation similar to the JavaScript one, where you have a AzureAISearchQueryType parameter in the AzureAiSearchEmbeddingStore that allows to choose which one of the 3 search types you want to use.

jdubois avatar Feb 01 '24 10:02 jdubois

Update 2: I totally missed that for hybrid search you also need the content! Otherwise it wouldn’t be hybrid 🤣 So this can’t work with the current interface. I’ll have an (imperfect) first draft PR today.

jdubois avatar Feb 01 '24 11:02 jdubois