chatgpt-retrieval-plugin icon indicating copy to clipboard operation
chatgpt-retrieval-plugin copied to clipboard

addition of chromadb increased docker image size from 1.5G to 6.31G

Open hansvdam opened this issue 2 years ago • 3 comments

I do like the addition of chroma-db (PR #59) to the retrieval plugin. Unfortunately Chroma has a very heavy dependency tree (e.g. because of Torch). This means that this addition has quadrupled the image size and build time. Although it is nice to have chromadb, it means that if you do not use ChromaDb you have to strip it away, to arrive a reasonable docker image size. Any thoughts on this? (use different branches?)

hansvdam avatar May 11 '23 09:05 hansvdam

I would just use the project as a template and customize to my own needs, only including used dependencies in my deployment.

rankun203 avatar May 13 '23 02:05 rankun203

+1 on this.

It's unreasonable to have such a huge dependency for a feature that's optional.

You can quickly remove chroma by running poetry remove chromadb. However, I think not having chroma in the image should be the default in this case.

ejoebstl avatar May 13 '23 12:05 ejoebstl

We're landing a fix for that shortly: https://github.com/chroma-core/chroma/pull/267

This will eliminate that entire dependency tree - the reason it currently exists is that chroma uses local sentence transformer embeddings by default. After the above change, these will be optional and we'll ship with a lightweight ONNX model.

atroyn avatar May 13 '23 23:05 atroyn

The fix has landed - the dependencies which were causing the image to balloon are gone. Thanks for bearing with us!

https://github.com/chroma-core/chroma/pull/267

atroyn avatar May 24 '23 17:05 atroyn

https://github.com/openai/chatgpt-retrieval-plugin/issues/292#issue-1731782640

hopefully this will help someone

masterkain avatar May 31 '23 18:05 masterkain