langchain icon indicating copy to clipboard operation
langchain copied to clipboard

feat: filter on list of values

Open vibha0411 opened this issue 1 year ago • 4 comments

Issue

The old filter could only filter on a single value, as it used the ElasticSearch match query.

Description

This change would allow the user to filter on a list of values using the ElasticSearch terms query. Fixes: https://github.com/hwchase17/langchain/issues/2095#issuecomment-1536225159

vibha0411 avatar May 05 '23 14:05 vibha0411

could we add an integration test with a filter to make sure things work?

dev2049 avatar May 05 '23 20:05 dev2049

I have written integration test with a filter.

def test_filter_query(self, elasticsearch_url: str) -> None:
        texts = ["foo", "bar", "baz", "hello", "igloo", "sharks"]
        metadatas = [{"page": i} for i in range(len(texts))]
        docsearch = ElasticVectorSearch.from_texts(
            texts,
            FakeEmbeddings(),
            metadatas=metadatas,
            elasticsearch_url=elasticsearch_url,
        )
        search_result = docsearch.similarity_search_by_vector(
            "sharks", k=1, filter={"page": [5, 6]}
        )
        assert len(search_result) != 0

However, I am unable to test, as when i run make integration_tests I get the following error. langchain_issue Could I please get some help on this?

vibha0411 avatar May 07 '23 19:05 vibha0411

Hey everyone, there is a workaround for this filtering.

vec = VectorStoreRetriever(vectorstore=vectorstore, search_kwargs={"where_document":{"$or": [{"$contains": "search_string_1"}, {"$contains": "search_string_1"}]}})

pedrobuenoxs avatar May 08 '23 18:05 pedrobuenoxs

@vibha0411 you probably don't want to run all integration tests (that requires a bunch of optional imports and access tokens). you can run just the file you've changed with

pytest tests/integration_tets/<PATH_TO_FILE>.py

@pedrobuenoxs very cool, good to know! my sense is @vibha0411's change may still be good for ease of use

dev2049 avatar May 08 '23 19:05 dev2049

stale, this elastic vector store being deprecated

baskaryan avatar Aug 11 '23 20:08 baskaryan