langchain
langchain copied to clipboard
Add ability to add extra fields to AzureSearch VectorStore when adding documents
Feature request
Currently the AzureSearch VectorStore allows the user to specify a filter that can be used to filter (in the traditional search engine sense) a search index become doing a vector similarity search. This reduces the search space to improve speed as well as to help focus the vector search on the correct subset of documents.
This filtering feature is very hard to effectively use because the current method for adding documents (add_texts) only allows an id, content, content_vector, and metadata fields. None of these fields are suitable for filtering, so this requires the user to go back and add fields manually to the search index.
I propose that we allow the end user to specify extra fields that are added when creating these vectors. The end user would do something like this:
extra_fields = {"extra_fields": {"important_field_1": 123, "important_field_2": 456}}
documents.append(doc1)
documents.append(doc2)
documents.append(doc3)
vector_store.add_documents(documents, **extra_fields)
Then when the user queries this vector store late they can do something like this:
retriever.search_kwargs = {'filters': "important_field_1 eq 123"}
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
)
Motivation
My motivation was need for a project I'm working on, but I felt this was a needed general feature, as I stated in the feature request:
This filtering feature is very hard to effectively use because the current method for adding documents (add_texts) only allows an id, content, content_vector, and metadata fields. None of these fields are suitable for filtering, so this requires the user to go back and add fields manually to the search index.
Your contribution
Hopefully this makes sense, let me know if any clarifications are needed, once the bug #6131 is fixed I will submit a PR that implements this, I have it working locally and just need to write appropriate unit tests. Unit tests will not be possible until this bug is fixed.
@ruoccofabrizio since you are the original implementer of AzureSearch.py can you validate my thinking?
Had the same issue this morning and yesterday afternoon @hwchase17 @ruoccofabrizio both this and #6132 would be great aditions
agreed. the filter is already passed in but without the ability to add fields while adding texts, it's of little use
Hi all, just wondering if there was any update on this?
Hi all, just wondering if there was any update on this?
I've got it implemented but it's dependant on #6132 getting merged first, I'm unsure if I missed something on that PR, or if it is just lost in the mix of so many things to get merged... Hopefully, someone will have a chance to look at it soon.
Hi @CameronVetter -- Great job on getting #6132 merged in!
Do you think we can get the ability to add extra fields to Azure Search Vector Store store PR? 🙏
Hi, @CameronVetter! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
Based on my understanding, you requested the ability to add extra fields to the AzureSearch VectorStore when adding documents. It seems that you have implemented the feature, but it is dependent on another PR being merged first. One user has commended your work and is hopeful that the feature will be added soon.
Before we close this issue, we wanted to check if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your contribution!
#6459 Has added this feature in a different way which fulfills the need. Closing this issue.
Thank you, @CameronVetter, for closing the issue. We appreciate your contribution to the LangChain repository!