chromem-go
chromem-go copied to clipboard
Add QueryWithNegative
QueryWithNegative allows you to pass a string that will be excluded from the search results.
There are tree ways to implement this:
- Find the document matching the normal query. Then re-order the result by multiplying the similarity by the dot product with the negative vector and some constant.
- Find the document matching the normal query. Exclude documents where the dot product with the negative vector are above a constant.
- The simpler method I implemented which just subtracts the negative vector from the positive one and re-normalizes the result.
I have done some simple tests and the results look good.
I'm not sure if the extra function is a nice API. It could also be added as extra argument to Query. Or maybe Query should get a struct for it's argument as the number of arguments seems to keep increasing.
That's an interesting feature!
- The simpler method I implemented which just subtracts the negative vector from the positive one and re-normalizes the result.
I have done some simple tests and the results look good.
Did you test with positive and negative queries that were similar (e.g. a use case where a result is likely to appear but you want to exclude), as well also where they were entirely different (e.g. for use cases where some result is already unlikely, but you want to be even more certain to exclude it)?
I'm worried that the resulting vector C=A-B might be too different from A in some cases. E.g.
What about implementing all three and letting the user choose between them?
I'm not sure if the extra function is a nice API. It could also be added as extra argument to Query. Or maybe Query should get a struct for it's argument as the number of arguments seems to keep increasing.
WDYT about keeping the Query for now (for backward compatibility and catering to simple / most common use cases), but introducing a QueryWithOptions that already takes a struct as parameter with the where, whereDocument and negativeQuery fields?
Related to above proposal with three implementations, one field could also be an enum for picking the way to exclude the negative query.
Maybe this PR could just introduce the QueryWithOptions and the enum with two values (unspecified, which validates to an error, and the substract exclusion), and follow-up PRs can introduce the other two ways?
I rewrote it to QueryWithOptions. Can you please have a look.
I'm not using this feature in production yet. I'm still playing around with it, trying to improve it. Since embedding models can't encode something being negative, I'm currently using a fast model like Phi3 to split a search query users write into a positive and a negative part (see below). And then passing these parts to my index in chromem to find the results.
Break down the following search query in it's positive and negative parts. For example for ``` football not manager ``` you should return: ``` positive: football negative: manager ``` Or another example, for: ``` village builder but not idle ``` you should return: ``` positive: village builder negative: idle ``` Here is the query: ``` ... ```
Thank you for the changes/improvements! I should be able to review again before the end of the week.