chroma icon indicating copy to clipboard operation
chroma copied to clipboard

[Feature Request]: Configure path for `.chroma/index' for DuckDB when used without persistence

Open strangest-quark opened this issue 2 years ago • 0 comments

Describe the problem

I'm running an app on AWS lambda with chroma and want to use DuckDB without persistence. Lambda does not allow writes to any directory other than /tmp. Failure stacktrace below.

[ERROR]	2023-05-07T14:21:19.196Z	f701e6be-197b-4134-91df-e64c75502077	Exception in 'http' protocol.
Traceback (most recent call last):
  File "/tmp/sls-py-req/mangum/protocols/http.py", line 97, in run
    await app(self.scope, self.receive, self.send)
  File "/tmp/sls-py-req/fastapi/applications.py", line 270, in __call__
    await super().__call__(scope, receive, send)
  File "/tmp/sls-py-req/starlette/applications.py", line 124, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/tmp/sls-py-req/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/tmp/sls-py-req/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/tmp/sls-py-req/starlette/middleware/exceptions.py", line 75, in __call__
    raise exc
  File "/tmp/sls-py-req/starlette/middleware/exceptions.py", line 64, in __call__
    await self.app(scope, receive, sender)
  File "/tmp/sls-py-req/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/tmp/sls-py-req/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/tmp/sls-py-req/starlette/routing.py", line 680, in __call__
    await route.handle(scope, receive, send)
  File "/tmp/sls-py-req/starlette/routing.py", line 275, in handle
    await self.app(scope, receive, send)
  File "/tmp/sls-py-req/starlette/routing.py", line 65, in app
    response = await func(request)
  File "/tmp/sls-py-req/fastapi/routing.py", line 231, in app
    raw_response = await run_endpoint_function(
  File "/tmp/sls-py-req/fastapi/routing.py", line 162, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/tmp/sls-py-req/starlette/concurrency.py", line 41, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/tmp/sls-py-req/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/tmp/sls-py-req/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/tmp/sls-py-req/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/var/task/src/main.py", line 69, in comprehendPolicy
    index = getQueryIndex(saveSiteText(policy.policy_link))
  File "/var/task/src/main.py", line 58, in getQueryIndex
    return VectorstoreIndexCreator().from_loaders([loader])
  File "/tmp/sls-py-req/langchain/indexes/vectorstore.py", line 73, in from_loaders
    return self.from_documents(docs)
  File "/tmp/sls-py-req/langchain/indexes/vectorstore.py", line 78, in from_documents
    vectorstore = self.vectorstore_cls.from_documents(
  File "/tmp/sls-py-req/langchain/vectorstores/chroma.py", line 412, in from_documents
    return cls.from_texts(
  File "/tmp/sls-py-req/langchain/vectorstores/chroma.py", line 380, in from_texts
    chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids)
  File "/tmp/sls-py-req/langchain/vectorstores/chroma.py", line 159, in add_texts
    self._collection.add(
  File "/tmp/sls-py-req/chromadb/api/models/Collection.py", line 111, in add
    self._client._add(ids, self.name, embeddings, metadatas, documents, increment_index)
  File "/tmp/sls-py-req/chromadb/api/local.py", line 140, in _add
    self._db.add_incremental(collection_uuid, added_uuids, embeddings)
  File "/tmp/sls-py-req/chromadb/db/clickhouse.py", line 542, in add_incremental
    index.add(uuids, embeddings)
  File "/tmp/sls-py-req/chromadb/db/index/hnswlib.py", line 124, in add
    self._init_index(dim)
  File "/tmp/sls-py-req/chromadb/db/index/hnswlib.py", line 107, in _init_index
    self._save()
  File "/tmp/sls-py-req/chromadb/db/index/hnswlib.py", line 178, in _save
    os.makedirs(f"{self._save_folder}")
  File "/var/lang/lib/python3.8/os.py", line 213, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/var/lang/lib/python3.8/os.py", line 223, in makedirs
    mkdir(name, mode)
OSError: [Errno 30] Read-only file system: '.chroma'

Describe the proposed solution

Could we get a way to specify the base directory path to create .chroma in?

Alternatives considered

I've switched to using chroma with persistence (specifying persist_directory as /tmp) to avoid the .chroma creation. I'm able to use the lambda now. But, I don't need persistence for my case and this extra I/O might be an overhead.

Importance

would make my life easier

Additional Information

No response

strangest-quark avatar May 07 '23 15:05 strangest-quark