langchain icon indicating copy to clipboard operation
langchain copied to clipboard

[Question] How to use Qdrant MatchAny filter?

Open YLFxGen opened this issue 2 years ago • 1 comments

Feature request

Qdrant allows you to set the conditions to be used when searching or retrieving points. The filter is passed as MetadataFilter right now. Can we pass rest.Filter directly so that we can utilize all the filters provided by Qdrant.

 def _qdrant_filter_from_dict(self, filter: Optional[MetadataFilter]) -> Any:
        if not filter:
            return None

        from qdrant_client.http import models as rest

        return rest.Filter(
            must=[
                condition
                for key, value in filter.items()
                for condition in self._build_condition(key, value)
            ]
        )

Motivation

I'm frustrated with how to only talk to a few document ingested in Qdrant. From my understanding, the current implementation only allows you to perform 'and' operation among the filtering metadatas. Is it able to perform 'or' operation?

What I have:

retriever=qdrant_store.as_retriever(
    search_kwargs={
        "filter": {"source_file":"file_1.md"}
    }
),

What I want:

retriever=qdrant_store.as_retriever(
    search_kwargs={
        rest.Filter(
            must=[
                rest.FieldCondition(
                    key="source_file",
                    match=rest.MatchAny(any=["file_1.md", "file_2.md"]),
                )
            ]
        )
    }
),

Your contribution

N/A

YLFxGen avatar May 22 '23 01:05 YLFxGen

I'm new to python so I'm not sure whether this is the correct way to handle the issue. I resolved it by creating my qdrant.py and changing it a little bit. Change the filter param in similarity_search and similarity_search_with_score functions from type MetadataFilter to Filter.

 results = self.client.search(
    collection_name=self.collection_name,
    query_vector=self._embed_query(query),
    query_filter=self._qdrant_filter_from_dict(filter), # change this line to query_filter = filter
    with_payload=True,
    limit=k,
)

Then, in qdrant.as_retriver(), you can pass like this. The default metadata_payload_key is "metadata". You can also create a function in qdrant.py to append self.metadata_payload_key to key in your field condition so you don't need to include it here.

rest.Filter(
  must=[
    rest.FieldCondition(
      key=f"{metadata_payload_key}.source",
      match=rest.MatchAny(any=["file_1.md", "file_2.md"]),
    )
  ]
)

Meanwhile, can anyone guide me the reasons behind the current approach in the source file? Why does it let user pass the Dict and construct the models.Filter from it instead of letting user directly pass the Filter which seems to be more flexible?

YLFxGen avatar May 23 '23 05:05 YLFxGen

@YLFxGen Support of Qdrant filters has been implemented in https://github.com/hwchase17/langchain/pull/5446.

kacperlukawski avatar May 31 '23 12:05 kacperlukawski