[Question] How to use Qdrant MatchAny filter?
Feature request
Qdrant allows you to set the conditions to be used when searching or retrieving points. The filter is passed as MetadataFilter right now. Can we pass rest.Filter directly so that we can utilize all the filters provided by Qdrant.
def _qdrant_filter_from_dict(self, filter: Optional[MetadataFilter]) -> Any:
if not filter:
return None
from qdrant_client.http import models as rest
return rest.Filter(
must=[
condition
for key, value in filter.items()
for condition in self._build_condition(key, value)
]
)
Motivation
I'm frustrated with how to only talk to a few document ingested in Qdrant. From my understanding, the current implementation only allows you to perform 'and' operation among the filtering metadatas. Is it able to perform 'or' operation?
What I have:
retriever=qdrant_store.as_retriever(
search_kwargs={
"filter": {"source_file":"file_1.md"}
}
),
What I want:
retriever=qdrant_store.as_retriever(
search_kwargs={
rest.Filter(
must=[
rest.FieldCondition(
key="source_file",
match=rest.MatchAny(any=["file_1.md", "file_2.md"]),
)
]
)
}
),
Your contribution
N/A
I'm new to python so I'm not sure whether this is the correct way to handle the issue. I resolved it by creating my qdrant.py and changing it a little bit.
Change the filter param in similarity_search and similarity_search_with_score functions from type MetadataFilter to Filter.
results = self.client.search(
collection_name=self.collection_name,
query_vector=self._embed_query(query),
query_filter=self._qdrant_filter_from_dict(filter), # change this line to query_filter = filter
with_payload=True,
limit=k,
)
Then, in qdrant.as_retriver(), you can pass like this. The default metadata_payload_key is "metadata". You can also create a function in qdrant.py to append self.metadata_payload_key to key in your field condition so you don't need to include it here.
rest.Filter(
must=[
rest.FieldCondition(
key=f"{metadata_payload_key}.source",
match=rest.MatchAny(any=["file_1.md", "file_2.md"]),
)
]
)
Meanwhile, can anyone guide me the reasons behind the current approach in the source file? Why does it let user pass the Dict and construct the models.Filter from it instead of letting user directly pass the Filter which seems to be more flexible?
@YLFxGen Support of Qdrant filters has been implemented in https://github.com/hwchase17/langchain/pull/5446.