TaskWeaver
TaskWeaver copied to clipboard
Multiple RAG as plugins
I have multiple RAG which I want to implement into Taskweaver as plugins.
RAG A will get data from Chromadb A. RAG B will get data from Chromadb B.
Hence I am looking at 'multiple yaml file into one Python implementation'. (https://microsoft.github.io/TaskWeaver/docs/plugin/multi_yaml_single_impl) All code should be the same, the only difference is the vectorstore.
I also saw from your previous RAG plugin example 'document_retriever.py' where you initiate the vectorstore for one time only, and the vectorstore will not be initiated again for subsequent queries if it has already done so. (Do I have the correct understanding?)
`@register_plugin class DocumentRetriever(Plugin): vectorstore = None
def _init(self):
try:
import tiktoken
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
except ImportError:
raise ImportError("Please install langchain-community first.")
self.embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
self.vectorstore = FAISS.load_local(
folder_path=self.config.get("index_folder"),
embeddings=self.embeddings,
)
with open(
os.path.join(
self.config.get("index_folder"),
"chunk_id_to_index.pkl",
),
"rb",
) as f:
self.chunk_id_to_index = pickle.load(f)
self.enc = tiktoken.encoding_for_model("gpt-3.5-turbo")
def __call__(self, query: str, size: int = 5, target_length: int = 256):
**if self.vectorstore is None:
self._init()**
`
So my question is, how does 'multiple yaml file into one Python implementation' work in the above implementation when it comes to initiating the vectorstore? Would it initiate the vectorstore as Chromadb A for 1 time, and that's it? When I use the same code for RAG 2 (which expects Chromadb B), would it still use Chromadb A given that it was initiated in the first time round?
Do you have other suggestions when it comes to implementing RAGs which query from different vectorstores?
Hopefully my question is not too confusing. Thanks in advance.
This is a great question to discuss.
First, although there is only one python script for the two plugins, they are initialized into two different object instances. Therefore, they are not interfering each other. They should have two different YAML files, and thus, two different plugin names. So, in the generated code, they are two different functions. The connections to Chromadb A and Chromadb B are kept in the two instances, respectively.
Second, I would suggest you take a look at the Role concept in TaskWeaver (blog). If you look at the main branch now, we have already re-implemented the document_retriever plugin into a role in the taskweaver/ext_role/ folder. I have a detailed explanation on why we made this change in the linked blog.