TaskWeaver icon indicating copy to clipboard operation
TaskWeaver copied to clipboard

Multiple RAG as plugins

Open SingTeng opened this issue 1 year ago • 1 comments
trafficstars

I have multiple RAG which I want to implement into Taskweaver as plugins.

RAG A will get data from Chromadb A. RAG B will get data from Chromadb B.

Hence I am looking at 'multiple yaml file into one Python implementation'. (https://microsoft.github.io/TaskWeaver/docs/plugin/multi_yaml_single_impl) All code should be the same, the only difference is the vectorstore.

I also saw from your previous RAG plugin example 'document_retriever.py' where you initiate the vectorstore for one time only, and the vectorstore will not be initiated again for subsequent queries if it has already done so. (Do I have the correct understanding?)

`@register_plugin class DocumentRetriever(Plugin): vectorstore = None

def _init(self):
    try:
        import tiktoken
        from langchain_community.embeddings import HuggingFaceEmbeddings
        from langchain_community.vectorstores import FAISS
    except ImportError:
        raise ImportError("Please install langchain-community first.")

    self.embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
    self.vectorstore = FAISS.load_local(
        folder_path=self.config.get("index_folder"),
        embeddings=self.embeddings,
    )
    with open(
        os.path.join(
            self.config.get("index_folder"),
            "chunk_id_to_index.pkl",
        ),
        "rb",
    ) as f:
        self.chunk_id_to_index = pickle.load(f)

    self.enc = tiktoken.encoding_for_model("gpt-3.5-turbo")

def __call__(self, query: str, size: int = 5, target_length: int = 256):
    **if self.vectorstore is None:
        self._init()**

`

So my question is, how does 'multiple yaml file into one Python implementation' work in the above implementation when it comes to initiating the vectorstore? Would it initiate the vectorstore as Chromadb A for 1 time, and that's it? When I use the same code for RAG 2 (which expects Chromadb B), would it still use Chromadb A given that it was initiated in the first time round?

Do you have other suggestions when it comes to implementing RAGs which query from different vectorstores?

Hopefully my question is not too confusing. Thanks in advance.

SingTeng avatar May 22 '24 07:05 SingTeng

This is a great question to discuss.

First, although there is only one python script for the two plugins, they are initialized into two different object instances. Therefore, they are not interfering each other. They should have two different YAML files, and thus, two different plugin names. So, in the generated code, they are two different functions. The connections to Chromadb A and Chromadb B are kept in the two instances, respectively.

Second, I would suggest you take a look at the Role concept in TaskWeaver (blog). If you look at the main branch now, we have already re-implemented the document_retriever plugin into a role in the taskweaver/ext_role/ folder. I have a detailed explanation on why we made this change in the linked blog.

liqul avatar May 22 '24 08:05 liqul