annlite icon indicating copy to clipboard operation
annlite copied to clipboard

Filtering based on expressions

Open patelprateek opened this issue 2 years ago • 5 comments

New user here so please bear with me some naive questions.

Is it possible to have some user code , math expression based filtering while doing graph search (hnsw)

I read documentation , can you provide some guidance or pointer on why filtering query language was based on mongodb query operators , any merits here as opposed to supporting sql operators supported by different databases ?

Looking at the code , i observe filtering done by binary_fuse_filter , could you please provide some details on if the filtering is pre-filtering or post retrieval filtering or during graph traversal filtering ? are their any engineering blog or documentation or the filtering strategy that you can point me to , wondering how canwe extend for other expressions based filtering

patelprateek avatar Aug 27 '22 21:08 patelprateek

Thanks for your interest.

why filtering query language was based on mongodb query operators , any merits here as opposed to supporting sql operators supported by different databases ?

For the query, we use MongoDB query as the query language. At the runtime, the mongo query will be transferred to SQL clauses (we implemented a simple converting function in annlite) which will be applied to filter the whole data. Here, we want to provide an easy way to construct the query. SQL-based query needs the user to know all of the information about the table name (it is an internal variable, not visible to the general user) and column names. And what's more, mongodb-style query is a Dict, which is widely used by modern KV database (ES, MongoDB, ...).

could you please provide some details on if the filtering is pre-filtering or post retrieval filtering or during graph traversal filtering ?

It is graph traversal filtering indeed. Each node is marked with bool value to indicate whether it matches the query conditions. At the hnsw graph search time, the same traversal search algorithm is performed to find the closed-neighors, and only the valid node is considered as the candidate result to be returned (here, it seems like a pre-filtering strategy).

are their any engineering blog or documentation or the filtering strategy that you can point me to , wondering how canwe extend for other expressions based filtering

For the technical design documentation, we are working on them. And what's more, which expression do you want to use for filtering in your case?

numb3r3 avatar Aug 29 '22 03:08 numb3r3

Thanks for the info. the use case i am thinking is evaluating some expression like distance < 0.xyz and metadata_color = "red" . I was under the impressions that the filter is constructed first by running the operators over some indexed data generating a small fuse filter of allowed/blocklisted candidates , and then during traversal we are just doing a probabilistic check ? we are not evaluating operators on neighbours during graph search but rather checking membership ?

patelprateek avatar Aug 30 '22 04:08 patelprateek

we are using allowed/blocklisted candidates, and check membership during graph search.

numb3r3 avatar Sep 02 '22 02:09 numb3r3

Hi, great discussion! During searching with filters are there any chances of a disconnected graph cropping up - say when all the neighbors of a node are invalidated by a filter?

It is graph traversal filtering indeed. Each node is marked with bool value to indicate whether it matches the query conditions. At the hnsw graph search time, the same traversal search algorithm is performed to find the closed-neighors, and only the valid node is considered as the candidate result to be returned (here, it seems like a pre-filtering strategy).

bandhakavi avatar Oct 21 '22 20:10 bandhakavi

For your question, actually, all of the graph nodes are considered valid during graph traversal. Only the nodes which match with the filter will be considered as the result candidates. Thus, there is no disconnected graph problem.

numb3r3 avatar Nov 04 '22 03:11 numb3r3