fastembed [Bug]: downloaded embedding models should not go to a temporary folder by default

What happened?

Fastembed is storing models in a /tmp/fastembed_cache folder by default, which is a pain to work with, and a waste of resources that have no reason to exist (because the tmp folder is often cleaned up, which causes the clients to often redownload these models without good reason, transfert of GBs through the network have a cost, and should not be encouraged when not needed).

I have been using fastembed for some time (python and rust lib, thanks for the great work!), and never had this issue until now, nowadays I have been greeted with this error, which is weird because I just run my script only twice today, so much less than usual

2025-10-31 16:09:31.611 | ERROR    | fastembed.common.model_management:download_model:430 - Could not download model from HuggingFace: 429 Client Error: Too Many Requests for url: https://huggingface.co/api/models/qdrant/bge-small-en-v1.5-onnx-q (Request ID: Root=1-6904d12b-6e1138c24b3eb6181d4b481d;91a1bccd-17b5-49b4-8bad-e0b9b3ada589)

We had to rate limit your IP (130.223.230.171). To continue using our service, create a HF account or login to your existing account, and make sure you pass a HF_TOKEN if you're using the API. Falling back to other sources.
2025-10-31 16:09:31.611 | ERROR    | fastembed.common.model_management:download_model:452 - Could not download model from either source, sleeping for 27.0 seconds, 0 retries left.

What is the expected behaviour?

Embedding models are large, they can be more than 1GB, forcing to download them every day is really a waste of resources and time for everyone that could be easily avoided

By default fastembed should store models in a non tmp folder (there are standard place in home folder to do this, or worst case just create a ~/.fastembed folder

Note also that how fastembed do caching is not well documented: cannot find anything in the docs

In my case I usually deploy this as long running service in docker containers, so I just persist the /tmp/fastembed_cache volume inside my container, e.g. compose conf:

    volumes:
      - ./.fastembed_cache:/tmp/fastembed_cache

But I am still facing redownload issues whenever I run stuff out of the container

And imagine that every dev who did not do this volume mapping is redownloading the models at every container restart....

Is there an outstanding reason for causing all these downloads? Am I missing something?

A minimal reproducible example

No response

What Python version are you on? e.g. python --version

3.13

FastEmbed version

latest

What os are you seeing the problem on?

No response

Relevant stack traces and/or logs

2025-10-31 16:09:19.379 | ERROR    | fastembed.common.model_management:download_model:430 - Could not download model from HuggingFace: 429 Client Error: Too Many Requests for url: https://huggingface.co/api/models/qdrant/bge-small-en-v1.5-onnx-q (Request ID: Root=1-6904d11f-1dfe1f91302937c221f352d0;5d8158c5-0e6a-4a0d-8c3e-667134688f78)

We had to rate limit your IP (130.223.230.171). To continue using our service, create a HF account or login to your existing account, and make sure you pass a HF_TOKEN if you're using the API. Falling back to other sources.
2025-10-31 16:09:19.380 | ERROR    | fastembed.common.model_management:download_model:452 - Could not download model from either source, sleeping for 3.0 seconds, 2 retries left.
2025-10-31 16:09:22.494 | ERROR    | fastembed.common.model_management:download_model:430 - Could not download model from HuggingFace: 429 Client Error: Too Many Requests for url: https://huggingface.co/api/models/qdrant/bge-small-en-v1.5-onnx-q (Request ID: Root=1-6904d122-0242b0b53748607116edfd62;9b1a42c9-9ad0-449a-90e1-fe6649713886)

We had to rate limit your IP (130.223.230.171). To continue using our service, create a HF account or login to your existing account, and make sure you pass a HF_TOKEN if you're using the API. Falling back to other sources.
2025-10-31 16:09:22.494 | ERROR    | fastembed.common.model_management:download_model:452 - Could not download model from either source, sleeping for 9.0 seconds, 1 retries left.
2025-10-31 16:09:31.611 | ERROR    | fastembed.common.model_management:download_model:430 - Could not download model from HuggingFace: 429 Client Error: Too Many Requests for url: https://huggingface.co/api/models/qdrant/bge-small-en-v1.5-onnx-q (Request ID: Root=1-6904d12b-6e1138c24b3eb6181d4b481d;91a1bccd-17b5-49b4-8bad-e0b9b3ada589)

We had to rate limit your IP (130.223.230.171). To continue using our service, create a HF account or login to your existing account, and make sure you pass a HF_TOKEN if you're using the API. Falling back to other sources.
2025-10-31 16:09:31.611 | ERROR    | fastembed.common.model_management:download_model:452 - Could not download model from either source, sleeping for 27.0 seconds, 0 retries left.

Oct 31 '25 15:10 vemonet

Hey, @vemonet

Thanks for using fastembed!

We are using /tmp cause it is not guaranteed that fastembed will have permission to write to any other default path.

Persisting model forever is an arguable point because the cache might become bloated and users might spend some time on figuring out why and how to free disk space.

In order to change the default directory, you can provide cache_dir to classes like TextEmbedding

I agree that it could've been documented better! We'll try to improve here 👍

Nov 10 '25 09:11 joein

Thanks @joein !

Documentation of cache_dir would already be a good improvement

We are using /tmp cause it is not guaranteed that fastembed will have permission to write to any other default path.

I beg to differ :)

There are standard approaches for doing this, that are used without issue by many other libraries: e.g.

[x] ollama stores models in ~/.ollama
[x] huggingface stores models in ~/.cache/huggingface
[x] uv uses ~/.cache/uv
[x] pre-commit uses ~/.cache/pre-commit
[x] pipx uses ~/.local/pipx
[x] cargo uses ~/.cargo

so it should not be a big challenge to implement a reliable smarter storage approach for fastembed too:

Get $HOME env var on linux and macOS, or %USERPROFILE% on windows
If you are even more picky you can use the XDG Base Directory Specification: https://specifications.freedesktop.org/basedir/latest/
- $XDG_DATA_HOME, $XDG_CACHE_HOME (a bit more tricky to implement, putting everything in ~/.fastembed/ would be easier)
If you can't write to the home folder or XDG, then fallback to /tmp

Nov 10 '25 12:11 vemonet