chroma icon indicating copy to clipboard operation
chroma copied to clipboard

[Bug]: Lack of transactionality mechanics in create_collection

Open tazarov opened this issue 3 months ago • 1 comments

What happened?

Creating a collection in Chroma involves several steps:

  1. Create the collection in sysdb
  2. Create segments (metadata + Vector)
  3. Create a Vector segment in sysdb
  4. Create metadata segment in sysdb

If any of steps 2-3 fails, Chroma is left in an inconsistent state, with the collection in sysdb. A subsequent delete_collection or get_or_create_collection may fix the problem. However, a simple create_collection will return a UniqueConstraint error.

This is not a critical issue, as there are ways to work around it. However, it highlights the need for robust error handling, including but not limited to rollback.

Versions

Chroma 0.4.x and 0.5.x (single-node), Any OS or Python version

Relevant log output

Python 3.11.7 (main, Dec 30 2023, 14:03:09) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import chromadb
>>> client = chromadb.Client()
>>> try:
...     client.create_collection("test",metadata={"hnsw:batch_size":100})
... except Exception as e:
...     print(e)
... 
Unknown HNSW parameter: hnsw:batch_size
>>> client.create_collection("test")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/tazarov/experiments/chroma/taz-sprint-14/chromadb/api/client.py", line 198, in create_collection
    return self._server.create_collection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tazarov/experiments/chroma/taz-sprint-14/chromadb/telemetry/opentelemetry/__init__.py", line 143, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/tazarov/experiments/chroma/taz-sprint-14/chromadb/api/segment.py", line 173, in create_collection
    coll, created = self._sysdb.create_collection(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tazarov/experiments/chroma/taz-sprint-14/chromadb/telemetry/opentelemetry/__init__.py", line 143, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/tazarov/experiments/chroma/taz-sprint-14/chromadb/db/mixins/sysdb.py", line 220, in create_collection
    raise UniqueConstraintError(f"Collection {name} already exists")
chromadb.db.base.UniqueConstraintError: Collection test already exists

Note: The above issue is reproducible for in-memory chroma single-node local or server (distributed not tested)

tazarov avatar May 01 '24 15:05 tazarov