chroma icon indicating copy to clipboard operation
chroma copied to clipboard

Chromadb getting locked.

Open bbl-my opened this issue 7 months ago • 2 comments

What happened?

I have indexed 2101951 chunks into chromadb. Each chunk is 512 tokens. Embeddings used: tiktoken

Below is code that is used for indexing. There is only one process that is doing indexing which is ongoing process with below code.

    def get_or_create_collection(self, client, collection_name):
        """
        Attempts to retrieve a collection. If it does not exist, creates a new one.

        :param client: The database client instance.
        :param collection_name: The name of the collection to retrieve or create.
        :return: The retrieved or newly created collection.
        :raises: Re-raises unexpected exceptions.
        """
        try:
            return client.get_collection(collection_name)
        except Exception as e:
            if "Collection" in str(e) and "does not exist" in str(e):
                logging.warning("Collection %s not found. Creating a new one.", collection_name)
                return client.create_collection(collection_name, metadata={"hnsw:space": "cosine"})
            else:
                logging.error("Unexpected error while getting collection %s: %s", collection_name, str(e))
                raise  # Re-raise the unexpected exception
collection = self.get_or_create_collection(client, collection_name)
collection.add(embeddings=embeddings, ids=ids, metadatas=meta_info, documents=chunks)

There is another process which has exposed rest api over few use cases. They simple do query over the collection on which this document is indexed.

Collection name - BusinessFocus_document Tenant - BusinessFocus Database - default_database

The issue that I am facing rite now is that chromadb is getting deadlock (which is more of a sqllite issue it seems). I am running with default chroma setup in a docker. Below are the logs

INFO:     [20-03-2025 11:12:33] 172.18.0.1:42200 - "POST /api/v2/tenants/BusinessFocus/databases/default_database/collections/34bd3d87-4674-4a6b-a304-6fee07e0ba81/add HTTP/1.1" 201
INFO:     [20-03-2025 11:12:34] 172.18.0.1:42200 - "GET /api/v2/tenants/BusinessFocus/databases/default_database/collections/BusinessFocus_document HTTP/1.1" 200
INFO:     [20-03-2025 11:12:34] 172.18.0.1:42200 - "POST /api/v2/tenants/BusinessFocus/databases/default_database/collections/34bd3d87-4674-4a6b-a304-6fee07e0ba81/add HTTP/1.1" 201
INFO:     [20-03-2025 11:12:35] 172.18.0.1:42200 - "GET /api/v2/tenants/BusinessFocus/databases/default_database/collections/BusinessFocus_document HTTP/1.1" 200
INFO:     [20-03-2025 11:12:35] 172.18.0.1:42200 - "POST /api/v2/tenants/BusinessFocus/databases/default_database/collections/34bd3d87-4674-4a6b-a304-6fee07e0ba81/add HTTP/1.1" 201
INFO:     [20-03-2025 11:12:43] 172.18.0.1:37054 - "GET /api/v2/tenants/BusinessFocus/databases/default_database/collections/BusinessFocus_document HTTP/1.1" 200
INFO:     [20-03-2025 11:12:43] 172.18.0.1:37054 - "POST /api/v2/tenants/BusinessFocus/databases/default_database/collections/34bd3d87-4674-4a6b-a304-6fee07e0ba81/add HTTP/1.1" 201
INFO:     [20-03-2025 11:12:49] 172.18.0.1:37068 - "GET /api/v2/tenants/BusinessFocus/databases/default_database/collections/BusinessFocus_document HTTP/1.1" 200
INFO:     [20-03-2025 11:12:49] 172.18.0.1:37068 - "POST /api/v2/tenants/BusinessFocus/databases/default_database/collections/34bd3d87-4674-4a6b-a304-6fee07e0ba81/add HTTP/1.1" 201
INFO:     [20-03-2025 11:12:51] 172.18.0.1:37068 - "GET /api/v2/tenants/BusinessFocus/databases/default_database/collections/BusinessFocus_document HTTP/1.1" 200
INFO:     [20-03-2025 11:12:51] 172.18.0.1:37068 - "POST /api/v2/tenants/BusinessFocus/databases/default_database/collections/34bd3d87-4674-4a6b-a304-6fee07e0ba81/add HTTP/1.1" 201
INFO:     [20-03-2025 11:12:57] 172.18.0.1:51582 - "GET /api/v2/tenants/BusinessFocus/databases/default_database/collections/BusinessFocus_document HTTP/1.1" 200
INFO:     [20-03-2025 11:12:57] 172.18.0.1:51582 - "POST /api/v2/tenants/BusinessFocus/databases/default_database/collections/34bd3d87-4674-4a6b-a304-6fee07e0ba81/add HTTP/1.1" 201
INFO:     [20-03-2025 11:13:05] 172.18.0.1:42864 - "GET /api/v2/tenants/BusinessFocus/databases/default_database/collections/BusinessFocus_document HTTP/1.1" 200
INFO:     [20-03-2025 11:13:05] 172.18.0.1:42864 - "POST /api/v2/tenants/BusinessFocus/databases/default_database/collections/34bd3d87-4674-4a6b-a304-6fee07e0ba81/add HTTP/1.1" 201
INFO:     [20-03-2025 11:13:12] 172.18.0.1:37588 - "GET /api/v2/tenants/BusinessFocus/databases/default_database/collections/BusinessFocus_document HTTP/1.1" 200
INFO:     [20-03-2025 11:13:12] 172.18.0.1:37588 - "POST /api/v2/tenants/BusinessFocus/databases/default_database/collections/34bd3d87-4674-4a6b-a304-6fee07e0ba81/add HTTP/1.1" 201
INFO:     [20-03-2025 11:17:19] 172.18.0.1:59490 - "GET /api/v2/tenants/BusinessFocus/databases/default_database/collections/BusinessFocus_document HTTP/1.1" 200
INFO:     [20-03-2025 11:17:19] 172.18.0.1:59490 - "POST /api/v2/tenants/BusinessFocus/databases/default_database/collections/34bd3d87-4674-4a6b-a304-6fee07e0ba81/add HTTP/1.1" 201
INFO:     [20-03-2025 11:17:27] 172.18.0.1:52980 - "GET /api/v2/tenants/BusinessFocus/databases/default_database/collections/BusinessFocus_document HTTP/1.1" 200
INFO:     [20-03-2025 11:17:27] 172.18.0.1:52980 - "POST /api/v2/tenants/BusinessFocus/databases/default_database/collections/34bd3d87-4674-4a6b-a304-6fee07e0ba81/add HTTP/1.1" 201
INFO:     [20-03-2025 11:17:42] 172.18.0.1:47672 - "GET /api/v2/tenants/BusinessFocus/databases/default_database/collections/BusinessFocus_document HTTP/1.1" 200
INFO:     [20-03-2025 11:17:42] 172.18.0.1:47672 - "POST /api/v2/tenants/BusinessFocus/databases/default_database/collections/34bd3d87-4674-4a6b-a304-6fee07e0ba81/add HTTP/1.1" 201
INFO:     [20-03-2025 11:17:46] 172.18.0.1:47672 - "GET /api/v2/tenants/BusinessFocus/databases/default_database/collections/BusinessFocus_document HTTP/1.1" 200
INFO:     [20-03-2025 11:17:47] 172.18.0.1:47672 - "POST /api/v2/tenants/BusinessFocus/databases/default_database/collections/34bd3d87-4674-4a6b-a304-6fee07e0ba81/add HTTP/1.1" 201
INFO:     [20-03-2025 11:17:53] 172.18.0.1:57436 - "GET /api/v2/tenants/BusinessFocus/databases/default_database/collections/BusinessFocus_document HTTP/1.1" 200
INFO:     [20-03-2025 11:17:53] 172.18.0.1:57436 - "POST /api/v2/tenants/BusinessFocus/databases/default_database/collections/34bd3d87-4674-4a6b-a304-6fee07e0ba81/add HTTP/1.1" 201
INFO:     [20-03-2025 11:18:00] 172.18.0.1:38730 - "GET /api/v2/tenants/Pharma_Regulatory HTTP/1.1" 200
INFO:     [20-03-2025 11:18:00] 172.18.0.1:38742 - "GET /api/v2/auth/identity HTTP/1.1" 200
INFO:     [20-03-2025 11:18:00] 172.18.0.1:38746 - "GET /api/v2/tenants/Pharma_Regulatory HTTP/1.1" 200
INFO:     [20-03-2025 11:18:00] 172.18.0.1:38746 - "GET /api/v2/tenants/Pharma_Regulatory/databases/default_database HTTP/1.1" 200
INFO:     [20-03-2025 11:18:00] 172.18.0.1:38742 - "GET /api/v2/tenants/Pharma_Regulatory/databases/default_database/collections/Pharma_Regulatory_document HTTP/1.1" 200
INFO:     [20-03-2025 11:22:27] 172.18.0.1:44220 - "GET /api/v2/tenants/BusinessFocus/databases/default_database/collections/BusinessFocus_document HTTP/1.1" 200
ERROR:    [20-03-2025 11:41:10] database is locked
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/anyio/streams/memory.py", line 111, in receive
    return self.receive_nowait()
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/streams/memory.py", line 106, in receive_nowait
    raise WouldBlock
anyio.WouldBlock

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/anyio/streams/memory.py", line 124, in receive
    return receiver.item
           ^^^^^^^^^^^^^
AttributeError: 'MemoryObjectItemReceiver' object has no attribute 'item'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 157, in call_next
    message = await recv_stream.receive()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/streams/memory.py", line 126, in receive
    raise EndOfStream
anyio.EndOfStream

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/chroma/chromadb/server/fastapi/__init__.py", line 107, in catch_exceptions_middleware
    return await call_next(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 163, in call_next
    raise app_exc
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 149, in coro
    await self.app(scope, receive_or_disconnect, send_no_error)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 185, in __call__
    with collapse_excgroups():
  File "/usr/local/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.11/site-packages/starlette/_utils.py", line 82, in collapse_excgroups
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 187, in __call__
    response = await self.dispatch_func(request, call_next)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/chroma/chromadb/server/fastapi/__init__.py", line 131, in check_http_version_middleware
    return await call_next(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 163, in call_next
    raise app_exc
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 149, in coro
    await self.app(scope, receive_or_disconnect, send_no_error)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 715, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 735, in app
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 288, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 76, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 73, in app
    response = await f(request)
               ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 301, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/chroma/chromadb/telemetry/opentelemetry/__init__.py", line 134, in async_wrapper
    return await f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/chroma/chromadb/server/fastapi/__init__.py", line 807, in add
    await to_thread.run_sync(
  File "/usr/local/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 943, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/chroma/chromadb/server/fastapi/__init__.py", line 789, in process_add
    return self._api._add(
           ^^^^^^^^^^^^^^^
  File "/chroma/chromadb/telemetry/opentelemetry/__init__.py", line 150, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/chroma/chromadb/api/segment.py", line 103, in wrapper
    return self._rate_limit_enforcer.rate_limit(func)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/chroma/chromadb/rate_limit/simple_rate_limit/__init__.py", line 23, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/chroma/chromadb/api/segment.py", line 437, in _add
    self._producer.submit_embeddings(collection_id, records_to_submit)
  File "/chroma/chromadb/telemetry/opentelemetry/__init__.py", line 150, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/chroma/chromadb/db/mixins/embeddings_queue.py", line 238, in submit_embeddings
    with self.tx() as cur:
  File "/chroma/chromadb/db/impl/sqlite.py", line 55, in __exit__
    self._conn.commit()
  File "/chroma/chromadb/db/impl/sqlite_pool.py", line 33, in commit
    self._conn.commit()
sqlite3.OperationalError: database is locked
INFO:     [20-03-2025 11:41:11] 172.18.0.1:44220 - "POST /api/v2/tenants/BusinessFocus/databases/default_database/collections/34bd3d87-4674-4a6b-a304-6fee07e0ba81/add HTTP/1.1" 500
ERROR:    [20-03-2025 11:41:11] cannot start a transaction within a transaction
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/anyio/streams/memory.py", line 111, in receive
    return self.receive_nowait()
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/streams/memory.py", line 106, in receive_nowait
    raise WouldBlock
anyio.WouldBlock

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/anyio/streams/memory.py", line 124, in receive
    return receiver.item
           ^^^^^^^^^^^^^
AttributeError: 'MemoryObjectItemReceiver' object has no attribute 'item'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 157, in call_next
    message = await recv_stream.receive()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/streams/memory.py", line 126, in receive
    raise EndOfStream
anyio.EndOfStream

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/chroma/chromadb/server/fastapi/__init__.py", line 107, in catch_exceptions_middleware
    return await call_next(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 163, in call_next
    raise app_exc
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 149, in coro
    await self.app(scope, receive_or_disconnect, send_no_error)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 185, in __call__
    with collapse_excgroups():
  File "/usr/local/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.11/site-packages/starlette/_utils.py", line 82, in collapse_excgroups
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 187, in __call__
    response = await self.dispatch_func(request, call_next)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/chroma/chromadb/server/fastapi/__init__.py", line 131, in check_http_version_middleware
    return await call_next(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 163, in call_next
    raise app_exc
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 149, in coro
    await self.app(scope, receive_or_disconnect, send_no_error)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 715, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 735, in app
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 288, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 76, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 73, in app
    response = await f(request)
               ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 301, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/chroma/chromadb/telemetry/opentelemetry/__init__.py", line 134, in async_wrapper
    return await f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/chroma/chromadb/server/fastapi/__init__.py", line 694, in get_collection
    await to_thread.run_sync(
  File "/usr/local/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 943, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/chroma/chromadb/telemetry/opentelemetry/__init__.py", line 150, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/chroma/chromadb/api/segment.py", line 103, in wrapper
    return self._rate_limit_enforcer.rate_limit(func)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/chroma/chromadb/rate_limit/simple_rate_limit/__init__.py", line 23, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/chroma/chromadb/api/segment.py", line 293, in get_collection
    existing = self._sysdb.get_collections(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/chroma/chromadb/telemetry/opentelemetry/__init__.py", line 150, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/chroma/chromadb/db/mixins/sysdb.py", line 444, in get_collections
    with self.tx() as cur:
  File "/chroma/chromadb/db/impl/sqlite.py", line 41, in __enter__
    self._conn.execute("BEGIN;")
  File "/chroma/chromadb/db/impl/sqlite_pool.py", line 29, in execute
    return self._conn.execute(sql)
           ^^^^^^^^^^^^^^^^^^^^^^^
sqlite3.OperationalError: cannot start a transaction within a transaction

After this error the only option left is to retart the docker but issue happens like every 2-4 hours.

Versions

Aws linux - t2.2xlarge Docker chroma version - 0.5.20

Relevant log output


bbl-my avatar Mar 20 '25 13:03 bbl-my