chroma icon indicating copy to clipboard operation
chroma copied to clipboard

Remove sentence-transformers as a hard requirement

Open jeffchuber opened this issue 1 year ago • 8 comments

Currently we use sentence-transformers as the default embedding model. However this means that it, and a lot of it's deps are included in the project. Additionally it downloads the model on start-up, which hurts startup time. Furthermore it makes Chroma not installable on certain envs, like Python 3.11.

Will close

  • https://github.com/chroma-core/chroma/issues/163

jeffchuber avatar Mar 29 '23 05:03 jeffchuber

Is there any workaround for using chromadb with python 3.11x? I have a VScode environment that is working well and I don't want to mess with it (still a newbie). I have been writing text-based AI code using chromadb in Colab but there are local modes like the microphone and speaker that I need to use.

Tanzengeist avatar Mar 29 '23 16:03 Tanzengeist

@Tanzengeist we are prioritizing this and will followup later today

jeffchuber avatar Mar 29 '23 17:03 jeffchuber

@jeffchuber Eagerly waiting for the solution. In the meantime, what alternative you recommend so I can use chromadb in my codebase?

ayush-vibrant avatar Mar 29 '23 19:03 ayush-vibrant

Jeff, I’m sure your all working hard on this. When you have a workaround, please send up a flare.

Tanzengeist avatar Mar 30 '23 03:03 Tanzengeist

#267 removes sentence-transformers, but unfortunately will still not unblock 3.11 as onnxruntime does not yet support it. With major packages like onnx and pytorch not supporting 3.11, it is hard for us to deliver models to users and support 3.11 until these dependencies do :(

HammadB avatar Mar 31 '23 16:03 HammadB

Works fine with: ARCHFLAGS="-arch x86_64" pip install chromadb See if that's any useful.

Reference: https://github.com/Yale-LILY/SummerTime/issues/116#issuecomment-984134322

kotakcloud avatar Apr 11 '23 06:04 kotakcloud

Any updates on removing sentence-transformers as a hard requirement?

RiccardoGrin avatar Apr 12 '23 22:04 RiccardoGrin

Hi ! I'm interested in this solution. Do we have a workaround before this is released ?

DiegoPiloni avatar Apr 19 '23 19:04 DiegoPiloni

Hi, the project seems not hard dependent on sentence-transformers, will this dependency be removed in the requirements?

specter119 avatar May 18 '23 01:05 specter119

@specter119 yes in two ways.

  1. the default bundling will be switched to the trimmed down ONNX model https://github.com/chroma-core/chroma/pull/267
  2. we will ship a client-only build of chroma as a separate pypi project

both very soon

jeffchuber avatar May 18 '23 04:05 jeffchuber

@jeffchuber thx, sentence-transformers brings a heavy dependency, which causes the Conda build not pass.

  • https://github.com/conda-forge/chromadb-feedstock/pull/6
  • Failed build

BTW, will the vector storage related features in LangChain are dependent on both server and client of chroma?

specter119 avatar May 18 '23 06:05 specter119

Good to know, im glad we are removing that.

Langchain by default uses the in-memory version of chroma which is more of a library than a client or a server.

jeffchuber avatar May 19 '23 04:05 jeffchuber

chroma-client fixed this. https://pypi.org/project/chromadb-client/ I think for most users

jeffchuber avatar Jul 27 '23 18:07 jeffchuber