[Feature Request]: Iterative query results for post search filtering
Describe the problem
For bigger datasets the current "where" filtering is very slow. For my vector database it takes almost 1 minute to apply the filter while searching without filtering is almost instantaneous.
Describe the proposed solution
For some bigger datasets it could be a better option to do the search without any filters applied and then do the filtering after. For this to work consistently, we need the ability to apply filters to the results one by one as they come in. We can then abort the search once the required number of results have been found. This would make it possible to do quite advanced filtering of the searches without having to add that functionality into the core chromadb code.
Alternatives considered
An alternative to iteratively looping through the search results could be to have a built in function to do the same. This function would accept filters but instead of applying them to the entire dataset ahead of search it applies it to the results while searching. It needs a maximum tries limit to avoid never ending loops.
Importance
would make my life easier
Additional Information
You can achieve similar things with the query method that returns results in batches , but you aren't able to ensure that you get the required number of results each time. So you will resolve to returning more results than needed most of the time, which is not very efficient.
@kkollsga the way we want to solve this is by adding indexes to metadata. Post-filtering will cause very strange behavior because you will have to massively "over-retrieve" and then possibly end up with 0 results. Very unintuitive.
also facing the slow "where" filtering problem. I got around 50k data in one collection.
Any updates on this limitation or "best practices" to get around the slow metadata filtering issue? Facing similar problems with datasets on the order of the ones described above.
Hey all. We made significant performance updates to Chroma in newer version, so closing this for now. If you still experience unacceptable performance, please feel free to open a bug report.