chroma
chroma copied to clipboard
[Feature Request]: Ability to specify local_files_only
Describe the problem
I'd like to be able to not make connections to random servers all over the world whenever I start my python script to debug it. Not only does it make my script slower, but also whenever a new version is downloaded, my previous tests are no longer valid.
Also, should the software being written go into production, the script's ability to function should not be dependent on subsequent requests across restarts to outside servers.
Describe the proposed solution
I would like to have the ability to specify the "local_files_only" parameter here:
self.embedding_function = embedding_functions.SentenceTransformerEmbeddingFunction(
model_name="sentence-transformers/paraphrase-multilingual-mpnet-base-v2",
# add parameter here
)
Alternatives considered
No response
Importance
i cannot use Chroma without it
Additional Information
https://github.com/huggingface/transformers/commit/a143d9479e3908ebe5bb32e9689dbf7d24eb536c
Currently I have to turn off the wifi every time I start the script, otherwise it'll just hang forever at requesting https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2/resolve/main/modules.json
read, ssl.py:1166
recv_into, ssl.py:1314
readinto, socket.py:706
_read_status, client.py:286
begin, client.py:325
getresponse, client.py:1390
getresponse, connection.py:466
_make_request, connectionpool.py:537
urlopen, connectionpool.py:793
send, adapters.py:486
send, _http.py:68
send, sessions.py:703
request, sessions.py:589
_request_wrapper, file_download.py:392
_request_wrapper, file_download.py:369
get_hf_file_metadata, file_download.py:1674
_inner_fn, _validators.py:119
hf_hub_download, file_download.py:1261
_inner_fn, _validators.py:119
load_file_path, util.py:551
_load_sbert_model, SentenceTransformer.py:1258
__init__, SentenceTransformer.py:197
__init__, embedding_functions.py:72 -> this is from venv/lib/python3.11/site-packages/chromadb/utils/embedding_functions.py
__init__, embedding.py:9
init_context, app.py:86
main, app.py:106
_run, events.py:84
_run_once, base_events.py:1936
run_forever, base_events.py:608
run_until_complete, base_events.py:641
run, runners.py:118
run, runners.py:190
<module>, app.py:310
@simaotwx, let me try to clarify this. What you want to be able to do is use embedding models (let's say sentence transformers) locally, without having to go to the internet to fetch them every time you redeploy Chroma.
We have PR about this - #1799
While this gets merged for sentence transformers, you can mount a dir for the model cache at /root/.cache/huggingface/transformers/ (this is the default path for HF sentence transformers). This will allow you to store existing or add new models to the cache without having to go the internet to fetch them.
Another alternative could be that you use Chroma base image chromadb/chroma:latest as a base and then add a layer on top of it with the models you need inside /root/.cache/huggingface/transformers/
@tazarov Not quite. I'm using the local embedded Chroma instance for development. The cache is there but it still goes to the internet to check if there are updates. If my internet is unstable (which, unfortunately, is quite common in Germany) it would prevent me from starting the script. I'm not using the Docker container.
@simaotwx, have you tried passing use_auth_token=False, which tells ST that you are ok with the local cache?
Where exactly do I pass this? See my example code, I'm using Chroma's embedding_functions.SentenceTransformerEmbeddingFunction
the ST EF supports kwargs: https://github.com/chroma-core/chroma/blob/e5ec1b39171f62db4efe549207e488bbbdb9a12c/chromadb/utils/embedding_functions.py#L66
you can call it like this:
ef = embedding_functions.SentenceTransformerEmbeddingFunction(
model_name="sentence-transformers/paraphrase-multilingual-mpnet-base-v2",
use_auth_token=False
)
I don't have that yet:
I think I have to update my ChromaDB dependency. Thanks!
Yeah this is in main but not in the latest release 0.4.24
Okay, I'll wait for the release
@simaotwx, new version of Chroma released last week. Please update your deps and try again.