chroma icon indicating copy to clipboard operation
chroma copied to clipboard

[Feature Request]: add more operators in where filter by metadata

Open gyula-coder opened this issue 1 year ago • 3 comments

Describe the problem

I create a collection with every document composed by page_content and a metadata 'style'. the text of style is some style types and joined by comma, for example, 'modern style, log style, french style'. when I retrieved in vector, I want to filter documents in which the style metadata has certain style name. but I find $contains was only supported by where_document filter but not by where filter. whether this feature will be added.

Describe the proposed solution

add contains and not_contains operators in where filter.

Alternatives considered

No response

Importance

nice to have

Additional Information

No response

gyula-coder avatar Apr 21 '24 17:04 gyula-coder

@tazarov im pretty sure there is an issue for this. can you help locate it? thank you! (ironically a good use case for vector search that github does not have natively (yet))

jeffchuber avatar Apr 22 '24 04:04 jeffchuber

We have PR (#1196) with this functionality, which has been pending for a while. A lot of people seem to be interested in this. The main challenge is feature parity with distributed/hosted Chroma. Some things are easy to implement in both relatively simple, and from discussion with @HammadB and @beggers, it would appear that the feature might take some time to support in the Rust backend.

It is worth spending some cycles thinking about the most efficient way to incorporate features for experimentation so that people can try them out and decide if this is worth it, and, of course, the team to figure out what the feature parity for this might be in distributed/hosted.

Here is a practical view of things:

Hypothesis - like/contains mechanics for metadata fields seems like a good idea. Reality: Empirical evidence of metadata performance shows that a feature like this can be challenging to scale beyond trivial database sizes (e.g. 100k+ records) on a single-node Chroma. (side note: Metadata performance is something that I am actively exploring)

tazarov avatar Apr 22 '24 08:04 tazarov

ok, thanks

gyula-coder avatar Apr 28 '24 12:04 gyula-coder