chroma icon indicating copy to clipboard operation
chroma copied to clipboard

[Feature Request]: Querying by metadata fields is not flexible enough

Open abrhaleitela opened this issue 1 year ago • 10 comments

Describe the problem

I am weighing up the trade-off between creating thousands of chroma collections and having few collections with more complex metadata objects so that I will be able to achieve filtering/querying based on different data type operations.

Do you plan to support (in the near future) more operations and data types (mainly custom objects such as json objects) in a given metadata?

Example:

collection.add(
    documents=[doc1],
    metadatas=[{"metadata1": [{"k1": "v1"}, {"k2": "v2"}]}],
    ids=["id1"]
)

Also, do you happen to have any plan to support string $contains operations in metadata where condition?

Example:

result = collection.query(
    query_texts=["This is sample query text"],
    where={"string_type_metadata_field": {"$contains": "substring"}}
)

Describe the proposed solution

  1. I would have loved to see if collection metadata can contain fields of any type. Examples list, map, set, json, etc. Today only supported operations seems to be: str, int, float or bool
  2. When querying with where keyword, I would love to see more operations supported like string/list/map/set contains keywords. Today only supported operations seems to be: $gt, $gte, $lt, $lte, $ne, $eq, $in, $nin

Alternatives considered

No response

Importance

would make my life easier

Additional Information

No response

abrhaleitela avatar Sep 29 '23 02:09 abrhaleitela

I fully support the proposed features. One addtion:

Besides $contains, I would also appreciate $regex (as in MongoDB: @Link)

Thanks for the excellent work so far!

nielscs avatar Oct 18 '23 06:10 nielscs

I also needed more filtering possibilities, so I went to investigate what can be done. The collections are implemented as SQL databases, so I don't think supporting more complex metadata would be possible (correct me if I'm wrong :)).

However, additional operators, such as the $regex @nielscs mentioned and something similar to the $contains @abrhaleitela mentioned, can be implemented, and they can also serve as a workaround for not having more complex metadata.

For example, it would be ideal for my use case to have a list in one metadata field and then filter the database based on what is in the list. I implemented the $like operator for the where operation and the $regex operator for both the where and where_document operations, and I was able to simulate the behavior I needed using these.

I created a pull request with these changes (https://github.com/chroma-core/chroma/pull/1393) ; hopefully that helps!

jelena-sarajlic avatar Nov 14 '23 13:11 jelena-sarajlic

I am confused by your examples. You are saying that you want to apply this filtering on list metadata, but looking at your examples I don't see lists as metadata but just strings. I have the same problem, so I guess I have to make my list metadata into a string and then apply the like operator to see if the string contains my substring?

LazyAIEnjoyer avatar Jan 08 '24 11:01 LazyAIEnjoyer

Not having a simple $like operator like in most SQL-based databases is almost a deal breaker to me and I realized the option is missing after setting up a lot of code to use Chroma. Even if Chroma cannot offer something as powerful as a $regex at least $contains (LIKE '%string%') would be greatly appreciated.

pevogam avatar Jan 10 '24 17:01 pevogam

@pevogam, we have a pending PR on this https://github.com/chroma-core/chroma/pull/1196. Adding these operators is not that difficult, but the team is mindful of adding operators that might be difficult to carry over to distributed/hosted version of Chroma.

tazarov avatar Jan 10 '24 17:01 tazarov

@pevogam, we have a pending PR on this #1196. Adding these operators is not that difficult, but the team is mindful of adding operators that might be difficult to carry over to distributed/hosted version of Chroma.

Thanks @tazarov for linking the PR here for those that end up investigating for issues first. In case the functionality should be disabled in certain applications and can easily be made available in others perhaps we can simply detect the type of use and disable it? But I will check for details now in the PR.

pevogam avatar Jan 11 '24 04:01 pevogam

Author mentioned adding lists to metadata. Is it something that might happen eventually ? The way I store lists is by doing str(my_list).

valentin-fngr avatar Apr 20 '24 11:04 valentin-fngr

I wholly support this. I find it a bit silly that metadata can only be strings. In my use case, for example, I have a list of documents extracted from a pdf, where each document is a page. That document contains an outline and an index, and I would love to add a list of keywords (extracted from the outline and/or index) to the metadata of each page. But right now the best I can do is save the list as a string and then split the string again when I need to consume the metadata, which is a silly extra step that is also error prone if you're not careful about your separators.

btonasse avatar Apr 26 '24 09:04 btonasse

Complex metadata support is much needed.

Raj725 avatar Jul 03 '24 05:07 Raj725

Yes please, at least a string-contains query in metadata would go a long way

armouti avatar Aug 11 '24 11:08 armouti

Complex metadata support such a list of strings is needed ASAP. Especially when other vector dbs (Qdrant) support this.

owquresh avatar Sep 25 '24 04:09 owquresh