chroma
chroma copied to clipboard
Chroma w/o Clickhouse
Hey there, I am scoping a variety of lightweight vectors stores and Chroma caught my attention for its batteries included approach. I was surprised to see clickhouse as a requirement and poked through the code a bit, looks like it's currently baked in there pretty good. I was wondering if you had any plans to make duckdb the default so the default install is more lightweight, and then add clickhouse or other DBs as an extra?
@totalhack do you mean in the requirements.txt?
I see clickhouse_connect in both requirements.txt and pyproject.toml, and it looks like your duckdb code depends on your clickhouse integration code to some extent as well (hence my "baked in" comment). Is there some way to install and run Chroma without clickhouse that I missed?
We are working on a large refactor to remove clickhouse entirely, https://github.com/chroma-core/chroma/pull/214, and make the backend horizontally scalable and serverless.
To confirm, you want to use in-memory chroma, but want to keep the total bundle size to a minimum?
Yea I'm ideally looking for a solution that can start very lightweight but has room to grow if needed. I saw elsewhere that you are also maybe removing sentence-transformers. What would your default embedding solution be then?
I am working on adding some optional NLP-based features to this data warehousing tool. I am hoping to make the default vector store as lightweight as possible while still avoiding reinventing the wheel. Chroma looks close to a fit.
Hi @totalhack you can run chroma without clickhouse, thats not a required path. Unfortunately we don't make the dependencies optional, is that what you are looking for? Once #267 lands, that is our approach to removing the dependency on sentence-transformers and replacing its deptree with a much lighter one.
Yea @HammadB , a combo of the two. Trying to minimize dependencies to minimize the chances of upstream conflicts for consumers of the library I'm developing, and also looking for something lightweight as a database to start but with the ability to be more than just a toy solution (Chroma seems close to a fit for this). Do you have a separate Chroma client install if I wanted to run the Chroma server as a separate docker service? I didn't see that, which prevented that workaround (have to do the full Chroma install to get the Chroma client).
@totalhack we have this now! https://pypi.org/project/chromadb-client/