chatgpt-retrieval-plugin
chatgpt-retrieval-plugin copied to clipboard
addition of chromadb increased docker image size from 1.5G to 6.31G
I do like the addition of chroma-db (PR #59) to the retrieval plugin. Unfortunately Chroma has a very heavy dependency tree (e.g. because of Torch). This means that this addition has quadrupled the image size and build time. Although it is nice to have chromadb, it means that if you do not use ChromaDb you have to strip it away, to arrive a reasonable docker image size. Any thoughts on this? (use different branches?)
I would just use the project as a template and customize to my own needs, only including used dependencies in my deployment.
+1 on this.
It's unreasonable to have such a huge dependency for a feature that's optional.
You can quickly remove chroma by running poetry remove chromadb. However, I think not having chroma in the image should be the default in this case.
We're landing a fix for that shortly: https://github.com/chroma-core/chroma/pull/267
This will eliminate that entire dependency tree - the reason it currently exists is that chroma uses local sentence transformer embeddings by default. After the above change, these will be optional and we'll ship with a lightweight ONNX model.
The fix has landed - the dependencies which were causing the image to balloon are gone. Thanks for bearing with us!
https://github.com/chroma-core/chroma/pull/267
https://github.com/openai/chatgpt-retrieval-plugin/issues/292#issue-1731782640
hopefully this will help someone