annlite icon indicating copy to clipboard operation
annlite copied to clipboard

Support for more Python types (list, set, etc) in the filtering columns

Open mirzakhalov opened this issue 2 years ago • 3 comments

https://github.com/jina-ai/annlite/blob/e4e706e313ba5cbfb7083a5dea9e75b8d2813394/executor/executor.py#L59

mirzakhalov avatar Jul 03 '22 06:07 mirzakhalov

Hey @mirzakhalov ,

May you please provide more details on the desired feature?

JoanFM avatar Jul 03 '22 14:07 JoanFM

So, the primary use case we are looking forward to is the support for 'list'. This would allow us to use MongoDB's $in operator (similar to this). For example, in a large database where documents can be owned by multiple accounts, we will be storing the user_ids in a list in the document's tags. We would want to be able pre-filter the search by user_id. If there are any other suggestions as how to achieve this, let us know.

mirzakhalov avatar Jul 03 '22 22:07 mirzakhalov

@mirzakhalov Thanks for pointing it out. Regarding your concern, there is no trivial approach to match your case in AnnLite. Only the types of scalar data are supported since the current pre-filtering is implemented based on sqlite as the kernel engine.

For your case, can this pre-filtering happens outside annlite? and the prefiltering results can be passed to annlite for efficient similarity search:

# pseudo-code 
indexer = Annlite(....)
doc_ids = mongodb.filter({"$in": {....}})

result = indexer.search(query, valid_docs=doc_ids)

numb3r3 avatar Jul 04 '22 03:07 numb3r3