[Bug]: Unable to modify (update) collection metadata - "hnsw:space" will be lost.
What happened?
So, if collection declared with "hnsw:space" attribute, there is no way to UPDATE existing metadata and save "hnsw:space" tag, due to code.
def _validate_modify_request(self, metadata: Optional[CollectionMetadata]) -> None: if metadata is not None: validate_metadata(metadata) if "hnsw:space" in metadata: raise ValueError( "Changing the distance function of a collection once it is created is not supported currently." )
So, currently, I have no way to update metadata in any way without broke existing metadata.
Versions
latest version, python 3.12
Relevant log output
No response
hey @amaxcz, thanks for reporting this. We have an ongoing work that addresses this issue - #1637
Additionally, in 0.6.0 we'll split the HNSW index configuration from the collection metadata.
It is important to note that your HNSW config is not lost otherwise Chroma wouldn't be able to perform its basic operation - semantic search. The actual aka usable HNSW configuration is kept in a separate table in the system DB of Chroma.
As a workaround to the above, I'd suggest you use (temporarily) get_or_create_collection to update your metadata:
Using collection.modify():
import chromadb
client = chromadb.PersistentClient("test_metadata_update")
col = client.get_or_create_collection("test_metadata_update", metadata={"test": "test", "hnsw:space":"cosine"})
col.add(ids=["1"], documents=["document 1"])
print(col.metadata)
# {'test': 'test', 'hnsw:space': 'cosine'}
col.modify(metadata={"test": "test2"})
print(col.metadata) #metadata gets overriden and if you try to add hnsw:space you get the error above
# {'test': 'test2'}
Using collection.get_or_create_collection():
import chromadb
client = chromadb.PersistentClient("test_metadata_update")
col = client.get_or_create_collection("test_metadata_update", metadata={"test": "test", "hnsw:space":"cosine"})
col.add(ids=["1"], documents=["document 1"])
print(col.metadata)
col = client.get_or_create_collection("test_metadata_update", metadata={"test": "new_value", "hnsw:space":"cosine"})
print(col.metadata)
# {'hnsw:space': 'ip', 'test': 'new_value'}
IMPORTANT: HNSW indeed cannot be changed and even if you enter the wrong space (e.g. hnsw:space = ip) it will not affect Chroma.