chroma
chroma copied to clipboard
[Bug]: Setting chromadb client-server results in "Remote end closed connection without response"
What happened?
Can't get to use it as a client server where my app is the client. I have initially reported it on the lanchain rep #2144 because I started using it there, from memory, and it seemed to work fine, but it generates a too big of a docker image, so tried to move it out and use it as an external service.
This when I started having issues: My code:
from chromadb.config import Settings
chroma_settings = Settings(chroma_api_impl=os.environ.get("CHROMA_API_IMPL"),
chroma_server_host=os.environ.get("CHROMA_SERVER_HOST"),
chroma_server_http_port=os.environ.get("CHROMA_SERVER_HTTP_PORT"))
vectorstore = Chroma(collection_name="langchain_store", client_settings=chroma_settings)
docsearch = vectorstore.from_documents(webpage, embeddings, collection_name="webpage")
The compose file in use:
version: '3'
services:
clickhouse:
image: clickhouse/clickhouse-server:22.9-alpine
platform: linux/amd64
environment:
- ALLOW_EMPTY_PASSWORD=yes
- CLICKHOUSE_TCP_PORT=9000
- CLICKHOUSE_HTTP_PORT=8123
ports:
- '8123:8123'
- '9000:9000'
volumes:
- ./clickhouse_data:/bitnami/clickhouse
- ./backups:/backups
- ./chroma/config/backup_disk.xml:/etc/clickhouse-server/config.d/backup_disk.xml
- ./chroma/config/chroma_users.xml:/etc/clickhouse-server/users.d/chroma.xml
networks:
- default
chroma_server:
image: ghcr.io/chroma-core/chroma:0.3.14
platform: linux/amd64
volumes:
- ./chroma:/chroma
- ./index_data:/index_data
command: uvicorn chromadb.app:app --reload --workers 1 --host 0.0.0.0 --port 8000 --log-config log_config.yml
environment:
- CHROMA_DB_IMPL=clickhouse
- CLICKHOUSE_HOST=clickhouse
- CLICKHOUSE_PORT=8123
ports:
- 3000:8000
depends_on:
- clickhouse
networks:
- default
The chroma folder contains the cloned project of chroma...
Versions
chroma:0.3.14, Python 3.9.7. Docker Desktop 4.17.0 on Apple M1 Max 13.2.1 with 64 GB Ram
Relevant log output
Current output of docker-compose -f docker-compose-chrome.yaml up is:
WARN[0000] The "Q" variable is not set. Defaulting to a blank string.
WARN[0000] The "Q" variable is not set. Defaulting to a blank string.
WARN[0000] The "Q" variable is not set. Defaulting to a blank string.
WARN[0000] Found orphan containers ([sia-alpha-demo-sia-frontend-1 sia-alpha-demo-sia-backend-1]) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
[+] Running 2/2
⠿ Container sia-alpha-demo-clickhouse-1 Created 0.1s
⠿ Container sia-alpha-demo-chroma_server-1 Created 0.1s
Attaching to sia-alpha-demo-chroma_server-1, sia-alpha-demo-clickhouse-1
sia-alpha-demo-clickhouse-1 | <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
sia-alpha-demo-clickhouse-1 | <jemalloc>: (This is the expected behaviour if you are running under QEMU)
sia-alpha-demo-clickhouse-1 | <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
sia-alpha-demo-clickhouse-1 | <jemalloc>: (This is the expected behaviour if you are running under QEMU)
sia-alpha-demo-clickhouse-1 | <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
sia-alpha-demo-clickhouse-1 | <jemalloc>: (This is the expected behaviour if you are running under QEMU)
sia-alpha-demo-clickhouse-1 | <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
sia-alpha-demo-clickhouse-1 | <jemalloc>: (This is the expected behaviour if you are running under QEMU)
sia-alpha-demo-clickhouse-1 | <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
sia-alpha-demo-clickhouse-1 | <jemalloc>: (This is the expected behaviour if you are running under QEMU)
sia-alpha-demo-chroma_server-1 | 2023-04-11 15:50:37 INFO uvicorn.error Will watch for changes in these directories: ['/chroma']
sia-alpha-demo-chroma_server-1 | 2023-04-11 15:50:37 INFO uvicorn.error Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
sia-alpha-demo-chroma_server-1 | 2023-04-11 15:50:37 INFO uvicorn.error Started reloader process [1] using WatchFiles
sia-alpha-demo-clickhouse-1 | <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
sia-alpha-demo-clickhouse-1 | <jemalloc>: (This is the expected behaviour if you are running under QEMU)
sia-alpha-demo-clickhouse-1 | <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
sia-alpha-demo-clickhouse-1 | <jemalloc>: (This is the expected behaviour if you are running under QEMU)
sia-alpha-demo-clickhouse-1 | <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
sia-alpha-demo-clickhouse-1 | <jemalloc>: (This is the expected behaviour if you are running under QEMU)
sia-alpha-demo-clickhouse-1 | <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
sia-alpha-demo-clickhouse-1 | <jemalloc>: (This is the expected behaviour if you are running under QEMU)
sia-alpha-demo-clickhouse-1 | <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
sia-alpha-demo-clickhouse-1 | <jemalloc>: (This is the expected behaviour if you are running under QEMU)
sia-alpha-demo-clickhouse-1 | Processing configuration file '/etc/clickhouse-server/config.xml'.
sia-alpha-demo-clickhouse-1 | Merging configuration file '/etc/clickhouse-server/config.d/backup_disk.xml'.
sia-alpha-demo-clickhouse-1 | Merging configuration file '/etc/clickhouse-server/config.d/docker_related_config.xml'.
sia-alpha-demo-clickhouse-1 | Logging trace to /var/log/clickhouse-server/clickhouse-server.log
sia-alpha-demo-clickhouse-1 | Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
sia-alpha-demo-clickhouse-1 | Processing configuration file '/etc/clickhouse-server/config.xml'.
sia-alpha-demo-clickhouse-1 | Merging configuration file '/etc/clickhouse-server/config.d/backup_disk.xml'.
sia-alpha-demo-clickhouse-1 | Merging configuration file '/etc/clickhouse-server/config.d/docker_related_config.xml'.
sia-alpha-demo-clickhouse-1 | Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs/config.xml'.
sia-alpha-demo-clickhouse-1 | Processing configuration file '/etc/clickhouse-server/users.xml'.
sia-alpha-demo-clickhouse-1 | Merging configuration file '/etc/clickhouse-server/users.d/chroma.xml'.
sia-alpha-demo-clickhouse-1 | Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs/users.xml'.
sia-alpha-demo-chroma_server-1 | 2023-04-11 15:50:40 INFO chromadb.telemetry.posthog Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information.
sia-alpha-demo-chroma_server-1 | 2023-04-11 15:50:40 INFO chromadb Running Chroma using direct local API.
sia-alpha-demo-chroma_server-1 | 2023-04-11 15:50:40 INFO chromadb Using Clickhouse for database
sia-alpha-demo-chroma_server-1 | qemu: uncaught target signal 11 (Segmentation fault) - core dumped
@LuisMalhadas how much memory do you have allocated to your docker image? you'll like need 4gb + however large your largest collection is. (4 bytes * dimensionality * number of vectors)
In that case, there are 32GB, of allocated (available) memory for docker Desktop. The only thing running besides chroma is a portrainer instance. The collection is empty at the beginning.
Same situation here. 16GB RAM, 16 vCPU via k8s. Working on testing 10GB PVC to see if this solves the issue.
@DylanAlloy how large is your largest collection in terms of number of embeddings? what is their dimensionality?
@DylanAlloy how large is your largest collection in terms of number of embeddings? what is their dimensionality?
This is occurring with a "blank slate" of a collection, inserting the following quantity of info with SentenceTransformer (default).
page_content='December 14, 2022 Chair Powell’s Press Conference PRELIMINARY' metadata={}
I will also note once this happens, the server stops responding completely. i..e .heartbeat(), *:8000, etc. all will be 500'd from this point on. I have to manually restart to obtain a sort of healthy up status again.
Due to developing the helm chart responsible for the deployment on my end (not being so focused on the collection itself), it is all ephemeral in nature and small enough that the issue could be investigated as a matter of inserting any data at all. I am working with many terabytes of implied storage space in the cluster so I'm not confident on which direction to solve this I should take.
Last thing I have tried is editing Dockerfile to firstly make a new directory in /chroma/chroma-data to mount a 10Gi volume to which I do later in my values.yaml, this also does not change the behavior or solve the issue.
I am encountering something similar here. I am running clickhouse via the bitnami helm chart and a k8s deployment for chroma in a way more or less the same as the above compose files
chroma deployment works fine, but shortly after the clickhouse pods spin up, I can successfully hit the list collections endpoint, but after a couple requests the chroma client starts raising 500s about the lack of the default collections table
❯ curl http://<myloadbalancerIP>:8000/api/v1/collections
{"error":"DatabaseError(\":HTTPDriver for http://clickhouse:8123/ returned response code 404)\\n Code: 60. DB::Exception: Table default.collections doesn't exist. (UNKNOWN_TABLE) (version 23.3.1.2823 (official build))\\n\")"}%
will investigate my k8s setup more - any thoughts on clickhouse side appreciated
edit: this failure is intermittent somehow, it seems effectively random whether or not chroma can talk to clickhouse - still investigating whether I could have created this error for myself somehow
@zzstoatzz can you do a kubectl describe svc clickhouse -n <your ns> and share output?
Unfortunately volumes were not the issue since I have attached one as index-data but the issue seems to happen before changes to the db can happen since during reproduction nothing is put into the persistent data directory. There seems to be something wrong during data insertion.
I got same issue. VM spec: 2vcpus, 4gb ram but it's working well in my local machine.
from chromadb.config import Settings
chromadbSettings = Settings(
chroma_api_impl="rest",
chroma_server_host="localhost",
chroma_server_http_port="8000"
)
vectordb = Chroma(embedding_function=embeddings, client_settings=chromadbSettings)
vectordb.add_texts(['hello', 'world', 'hi'])
............
return self.add_texts(texts, metadatas, **kwargs)
File "/home/giman/.local/lib/python3.8/site-packages/langchain/vectorstores/chroma.py", line 159, in add_texts
self._collection.add(
File "/home/giman/.local/lib/python3.8/site-packages/chromadb/api/models/Collection.py", line 101, in add
self._client._add(
File "/home/giman/.local/lib/python3.8/site-packages/chromadb/api/fastapi.py", line 180, in _add
resp = requests.post(
File "/home/giman/.local/lib/python3.8/site-packages/requests/api.py", line 115, in post
return request("post", url, data=data, json=json, **kwargs)
File "/home/giman/.local/lib/python3.8/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "/home/giman/.local/lib/python3.8/site-packages/requests/sessions.py", line 587, in request
resp = self.send(prep, **send_kwargs)
File "/home/giman/.local/lib/python3.8/site-packages/requests/sessions.py", line 701, in send
r = adapter.send(request, **kwargs)
File "/home/giman/.local/lib/python3.8/site-packages/requests/adapters.py", line 501, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
oh, it's working fine now after I upgrade to 4vcpu, 16gb RAM. ;)
Wow wtf, you need an entire 4gb of free memory just to run a single docker image? What an absolute hog docker is...
I ran into a similar issue and have managed to get around it by eschewing docker and just running "bare metal". This can be done from the root of this repo with a
pip install -r requirements.txt
IS_PERSISTENT=TRUE uvicorn chromadb.app:app --workers 1 --host 0.0.0.0 --port 8000 --proxy-headers --log-config log_config.yml
@ibeckermayer we are actually going to update the docs to recommend this path as well as make it a little easier (less flags and such). great minds think alike!
Closing this issue as Chroma's footprint has become much smaller since this thread was opened. Feel free to ping me to re-open it if desired!
I'm still experiencing the issue with the latest version of Chroma. I'm running it locally on Docker, which has 12GB of memory and 6 CPUs. I've attempted using both large and small collections, but the problem persists in both cases. Sometimes it initially works for the first few queries and then hangs again.
The issue arises specifically when I utilize 'chromadb.HttpClient(host='localhost', port=8000)' with Docker. Everything works fine when I use 'chromadb.PersistentClient()'. @jeffchuber
@jeffchuber signal 11 (segfault) happened for me right now. Collection size ~ 30k tokens Using default embedding model, ChromaDB v0.4.22