chroma icon indicating copy to clipboard operation
chroma copied to clipboard

Client/Server example

Open thefedoration opened this issue 2 years ago • 5 comments

Description of changes

Not backwards-compatible PR

This PR is inspired by https://github.com/chroma-core/chroma/issues/289 and updates the pip package to by default have minimal dependencies, only those needed to run as a client. This is because the install of all dependencies could take 30+ minutes, which makes chromaDB not ideal to deploy in a prod environment where such long build times are not acceptable.

With this change, it's possible to either install all ChromaDB dependencies as pip install chromadb[server], and just the client with pip install chromadb. Installing the client would require running another container with this repo as the server, and using the client in chroma_api_impl="rest" mode.

By splitting out the long-running dependency installs into their own container, we can ensure that apps running the client can have fast build times, and only require the long build times when there are actual updates to the server code.

This PR is meant as a suggestion and working example, but is not backwards compatible and a big ask, so I'm very open to other implementations of the same idea.

How to use

In requirements.txt (of app installing client):

# chromadb==0.3.20 # old way
chromadb @ git+https://github.com/thefedoration/chroma@client-server-dependencies # client-only mode

In docker-compose (to run chromadb server as a separate container):

# this will run chromadb as a service found at localhost:8001 or at chromadb:8000 on local network
services:
  chromadb:
    build: https://github.com/chroma-core/chroma.git#0.3.21
    ports:
      - "8001:8000"
    expose:
      - "8001"

Using chroma client in application

import os
import chromadb
from chromadb.config import Settings

# host defaults to chromadb for docker-compose example above, but for prod deployment set the CHROMADB_HOST env var
client = chromadb.Client(Settings(
        chroma_api_impl='rest',
        chroma_server_host=os.environ.get('CHROMADB_HOST', 'chromadb'),
        chroma_server_http_port="8000",
))

Usage with langchain

import os
from chromadb.config import Settings
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter

chroma_settings = Settings(
      chroma_api_impl="rest",
      chroma_server_host=os.environ.get('CHROMADB_HOST', 'chromadb'),
      chroma_server_http_port="8000",
)

all_texts = ["text1", "text2"]
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.create_documents(texts=all_texts)
embeddings = OpenAIEmbeddings()
vectordb = Chroma.from_documents(documents, embeddings, client_settings=chroma_settings)

Notes

  • This PR assumes the following dependencies are required to run the client only: fastapi, pandas, posthog, sentence-transformers. It's possible there are others depending on other methods you might be calling.
  • sentence-transformers and pandas is also a big install. Would be nice to be able to extract that from the client install as well, and keep it on the server

Test plan

  • Test that pip install chromadb doesn't install huge dependencies like duckdb and all chromadb.Client functionality still works as expected.
  • Test that pip install chromadb[server] installs all dependencies needed to run chroma in local mode

Documentation Changes

*Would add something like the above "How to use" section to the deployment docs

thefedoration avatar Apr 17 '23 21:04 thefedoration

@thefedoration this is interesting, thanks for taking a stab at this! we are still noodling how to best solve this - so will leave this open for now as a reference to the core team

jeffchuber avatar Apr 22 '23 00:04 jeffchuber

@jeffchuber After running it for a while, the build target is still quite a bit large due to the pandas and sentence transformers dependencies, and I can see that in my deploy times. I think the problem can be better addressed with a completely new python package that only makes REST calls to the server. Thanks for prioritizing this, looking forward to seeing what you folks come up with.

thefedoration avatar Apr 23 '23 19:04 thefedoration

@thefedoration yes definitely a priority! many people have asked for this - https://github.com/chroma-core/chroma/issues/289

jeffchuber avatar Apr 24 '23 13:04 jeffchuber

@thefedoration thanks for looking at this! and @jeffchuber thanks for prioritizing. I'd love to see a working version pulled in.

saginawj avatar Apr 28 '23 16:04 saginawj

This is a much needed change, would love to see this implemented as its next to impossible to have a viable client side build for small applications right now. Cheers @thefedoration @jeffchuber 🍻

Atharva2628 avatar Apr 30 '23 17:04 Atharva2628

We are going to solve this a different way by releasing a pypi package that is just the python client.

jeffchuber avatar May 11 '23 17:05 jeffchuber