[Bug]: Removal of metadata key
What happened?
mikekk — Yesterday (14-Jan-24) at 9:32 PM
Hello everyone, I am able to add new keys to the metadatas and their associated values, but once I 've added a new key to the collection metadatas, using the collection.update, I can only change its value, can't delete anymore the key from metadatas. Why that happens? Is it a bug or a wanted behaviour? Thanks.
https://discord.com/channels/1073293645303795742/1191351965230313472/1195812744188932136
Versions
Chroma 0.4.22, Colab, MacOS
Relevant log output
import chromadb
client = chromadb.PersistentClient() #this is in-memory client, adjust as per your needs
collection = client.get_or_create_collection("mytest")
collection.add(ids=["id1"],documents=["document 1"],metadatas=[{"key_to_keep":1,"key_to_remove":2}])
records = collection.get(ids=["id1"])
print(records["metadatas"][0])
# {'key_to_keep': 1, 'key_to_remove': 2}
del records["metadatas"][0]["key_to_remove"] #remove the unnecessary key
print(records)
# {'ids': ['id1'], 'embeddings': None, 'metadatas': [{'key_to_keep': 1}], 'documents': ['document 1'], 'uris': None, 'data': None}
collection.update(ids=records["ids"],documents=records["documents"], embeddings=records["embeddings"],metadatas=records["metadatas"])
# verify
records1 = collection.get(ids=["id1"])
print(records1["metadatas"][0])
# {'key_to_keep': 1, 'key_to_remove': 2}
@tazarov If I understand correctly this is happening because we're only inserting metadata when updating records and not touching the existing metadata. Does deleting the older metadata for the record and then inserting the new metadata sound like a good approach? If yes, can raise a PR.
@GauravWaghmare, take a look at the CIP PR for this - https://github.com/chroma-core/chroma/pull/1636. There is a bit more "nuance" to things.
I'd love your input on the CIP.
@tazarov Why should the behaviour for metadata update be any different from document update?
@GauravWaghmare, to accommodate different use cases of how people update metadata. Users will update a document in a single overwriting operation. The intent is clear, and they don't necessarily need to know what is in the document to make the update. Metadata differs significantly, depending on the use case. The PR addresses as many use cases as I could think of:
- Clear the metadata
- Partial update
- Full overwrite of metadata
Please elaborate on your thoughts in the PR if you feel the above use cases can somehow be merged into one or if you have a different idea about the implementation.
Will track in #839