langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Add ability to add extra fields to AzureSearch VectorStore when adding documents

Open CameronVetter opened this issue 1 year ago • 4 comments

Feature request

Currently the AzureSearch VectorStore allows the user to specify a filter that can be used to filter (in the traditional search engine sense) a search index become doing a vector similarity search. This reduces the search space to improve speed as well as to help focus the vector search on the correct subset of documents.

This filtering feature is very hard to effectively use because the current method for adding documents (add_texts) only allows an id, content, content_vector, and metadata fields. None of these fields are suitable for filtering, so this requires the user to go back and add fields manually to the search index.

I propose that we allow the end user to specify extra fields that are added when creating these vectors. The end user would do something like this:

extra_fields = {"extra_fields": {"important_field_1": 123, "important_field_2": 456}}

documents.append(doc1)
documents.append(doc2)
documents.append(doc3)

vector_store.add_documents(documents, **extra_fields)

Then when the user queries this vector store late they can do something like this:

retriever.search_kwargs = {'filters': "important_field_1 eq 123"}

qa = RetrievalQA.from_chain_type(
            llm=llm,
            chain_type="stuff",
            retriever=retriever,
        )

Motivation

My motivation was need for a project I'm working on, but I felt this was a needed general feature, as I stated in the feature request:

This filtering feature is very hard to effectively use because the current method for adding documents (add_texts) only allows an id, content, content_vector, and metadata fields. None of these fields are suitable for filtering, so this requires the user to go back and add fields manually to the search index.

Your contribution

Hopefully this makes sense, let me know if any clarifications are needed, once the bug #6131 is fixed I will submit a PR that implements this, I have it working locally and just need to write appropriate unit tests. Unit tests will not be possible until this bug is fixed.

CameronVetter avatar Jun 14 '23 03:06 CameronVetter

@ruoccofabrizio since you are the original implementer of AzureSearch.py can you validate my thinking?

CameronVetter avatar Jun 14 '23 03:06 CameronVetter

Had the same issue this morning and yesterday afternoon @hwchase17 @ruoccofabrizio both this and #6132 would be great aditions

SimplyJuanjo avatar Jun 14 '23 09:06 SimplyJuanjo

agreed. the filter is already passed in but without the ability to add fields while adding texts, it's of little use

cticevans avatar Jun 15 '23 02:06 cticevans

Hi all, just wondering if there was any update on this?

jbueza-railtownai avatar Jun 16 '23 17:06 jbueza-railtownai

Hi all, just wondering if there was any update on this?

I've got it implemented but it's dependant on #6132 getting merged first, I'm unsure if I missed something on that PR, or if it is just lost in the mix of so many things to get merged... Hopefully, someone will have a chance to look at it soon.

CameronVetter avatar Jun 16 '23 18:06 CameronVetter

Hi @CameronVetter -- Great job on getting #6132 merged in!

Do you think we can get the ability to add extra fields to Azure Search Vector Store store PR? 🙏

jbueza-railtownai avatar Jun 19 '23 22:06 jbueza-railtownai

Hi, @CameronVetter! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

Based on my understanding, you requested the ability to add extra fields to the AzureSearch VectorStore when adding documents. It seems that you have implemented the feature, but it is dependent on another PR being merged first. One user has commended your work and is hopeful that the feature will be added soon.

Before we close this issue, we wanted to check if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your contribution!

dosubot[bot] avatar Oct 16 '23 16:10 dosubot[bot]

#6459 Has added this feature in a different way which fulfills the need. Closing this issue.

CameronVetter avatar Oct 16 '23 16:10 CameronVetter

Thank you, @CameronVetter, for closing the issue. We appreciate your contribution to the LangChain repository!

dosubot[bot] avatar Oct 16 '23 16:10 dosubot[bot]