langchain langchain-mistralai cannot pull tokenizer from huggingface 401

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

Code:

from langchain_mistralai import MistralAIEmbeddings
import assistant.settings as settings

def getMistralEmbeddings():
    return MistralAIEmbeddings(mistral_api_key=settings.MISTRAL_API_KEY) #well defined variable from env, works on my personnal machine at the time i'm publishing the issue

Error Message and Stack Trace (if applicable)

Traceback (most recent call last): File "/app/assistant_api.py", line 37, in retriever = obtain_full_qdrant_tmdb() File "/app/assistant/rag/retrievers/qdrant_connector.py", line 30, in obtain_full_qdrant_tmdb embeddings = getMistralEmbeddings() File "/app/assistant/rag/embeddings/mistral_embeddings.py", line 5, in getMistralEmbeddings return MistralAIEmbeddings(mistral_api_key=settings.MISTRAL_API_KEY) File "/usr/local/lib/python3.10/site-packages/pydantic/v1/main.py", line 339, in init values, fields_set, validation_error = validate_model(pydantic_self.class, data) File "/usr/local/lib/python3.10/site-packages/pydantic/v1/main.py", line 1100, in validate_model values = validator(cls_, values) File "/usr/local/lib/python3.10/site-packages/langchain_mistralai/embeddings.py", line 86, in validate_environment values["tokenizer"] = Tokenizer.from_pretrained( File "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 119, in _inner_fn return fn(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1403, in hf_hub_download raise head_call_error File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1261, in hf_hub_download metadata = get_hf_file_metadata( File "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 119, in _inner_fn return fn(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1674, in get_hf_file_metadata r = _request_wrapper( File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 369, in _request_wrapper response = _request_wrapper( File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 393, in _request_wrapper hf_raise_for_status(response) File "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 321, in hf_raise_for_status raise GatedRepoError(message, response) from e huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-662165b4-2224fae43a813b360dc7b222;20b14ba7-ef96-4d6a-8bef-1fa42c4f9291)

Cannot access gated repo for url https://huggingface.co/mistralai/Mixtral-8x7B-v0.1/resolve/main/tokenizer.json. Repo model mistralai/Mixtral-8x7B-v0.1 is gated. You must be authenticated to access it. Traceback (most recent call last):

Description

This error, and this stackTrace occur when deployed on a kubernetes server since today afternoon. It's seems to me it's a bug because i cannot recreate the error on my personnal machine, even when i deleted the virtual environment, and the pycaches folders, and then reinstalled everything from the requirements.txt

I know i should authenticate, but firstly, why, and secondly how ? i came across some solutions where you have to put your huggingface token inside the header of the request, but i don't really know where to inject a token like this when using langchain-mistralai

System Info

aiohttp>=3.9.3 aiosignal>=1.3.1 annotated-types>=0.6.0 anyio>=4.3.0 async-timeout>=4.0.3 attrs>=23.2.0 certifi>=2024.2.2 charset-normalizer>=3.3.2 click>=8.1.7 dataclasses-json>=0.6.4 exceptiongroup>=1.2.0 faiss-cpu>=1.8.0 fastapi>=0.110.1 filelock>=3.13.4 frozenlist>=1.4.1 fsspec>=2024.3.1 greenlet>=3.0.3 grpcio>=1.62.1 grpcio-tools>=1.62.1 h11>=0.14.0 h2>=4.1.0 hpack>=4.0.0 httpcore>=1.0.5 httpx>=0.25.2 httpx-sse>=0.4.0 huggingface-hub>=0.22.2 hyperframe>=6.0.1 idna>=3.6 Jinja2>=3.1.3 joblib>=1.4.0 jsonpatch>=1.33 jsonpointer>=2.4 langchain>=0.1.15 langchain-community>=0.0.32 langchain-core>=0.1.41 langchain-mistralai>=0.1.1 langchain-text-splitters>=0.0.1 langsmith>=0.1.43 MarkupSafe>=2.1.5 marshmallow>=3.21.1 mistralai>=0.1.8 mpmath>=1.3.0 multidict>=6.0.5 mypy-extensions>=1.0.0 networkx>=3.3 numpy>=1.26.4 nvidia-cublas-cu12>=12.1.3.1 nvidia-cuda-cupti-cu12>=12.1.105 nvidia-cuda-nvrtc-cu12>=12.1.105 nvidia-cuda-runtime-cu12>=12.1.105 nvidia-cudnn-cu12>=8.9.2.26 nvidia-cufft-cu12>=11.0.2.54 nvidia-curand-cu12>=10.3.2.106 nvidia-cusolver-cu12>=11.4.5.107 nvidia-cusparse-cu12>=12.1.0.106 nvidia-nccl-cu12>=2.19.3 nvidia-nvjitlink-cu12>=12.4.127 nvidia-nvtx-cu12>=12.1.105 orjson>=3.10.0 packaging>=23.2 pandas>=2.2.1 pillow>=10.3.0 portalocker>=2.8.2 protobuf>=4.25.3 pyarrow>=15.0.2 pydantic>=2.6.4 pydantic_core>=2.16.3 python-dateutil>=2.9.0.post0 python-dotenv>=1.0.1 pytz>=2024.1 PyYAML>=6.0.1 qdrant-client>=1.8.2 redis>=5.0.3 regex>=2023.12.25 requests>=2.31.0 safetensors>=0.4.2 scikit-learn>=1.4.2 scipy>=1.13.0 sentence-transformers>=2.6.1 six>=1.16.0 sniffio>=1.3.1 SQLAlchemy>=2.0.29 starlette>=0.37.2 sympy>=1.12 tenacity>=8.2.3 threadpoolctl>=3.4.0 tokenizers>=0.15.2 torch>=2.2.2 tqdm>=4.66.2 transformers>=4.39.3 triton>=2.2.0 typing-inspect>=0.9.0 typing_extensions>=4.11.0 tzdata>=2024.1 urllib3>=2.2.1 uvicorn>=0.29.0 yarl>=1.9.4

Apr 18 '24 18:04 couardcourageux

MISTRAL_API_KEY here isn't related to Hugging Face Hub gate on mistralai/Mixtral-8x7B-v0.1 model. We also ran into this, and expected that if we set HUGGINGFACEHUB_API_TOKEN like in LangChain Huggingface Endpoints doc with an API token of a user with access would fix this issue. Unfortunately, still broken.

Apr 18 '24 19:04 reflection

yes, that's really strange

Apr 18 '24 19:04 couardcourageux

Closing this issue as langchain-mistralai is for the mistral API not for hugging face. If there's a separate issuewith hugging face endpoints feel free to open or :+1: on it

Apr 18 '24 20:04 eyurtsev

@eyurtsev Anyone using from langchain_mistralai.embeddings import MistralAIEmbeddings will run into this issue when it first initializes and tries to download https://huggingface.co/mistralai/Mixtral-8x7B-v0.1/resolve/main/tokenizer.json from Hugging Face.

Apr 18 '24 20:04 reflection

Just ran into this issue. Anyone have a workaround for this?

Apr 18 '24 23:04 hslee16

For those who are facing this issue, I fixed it by doing the following:

Create an account at huggingface.co if you don't already have one
Create a new access token at - https://huggingface.co/settings/tokens
Accept the "terms" at https://huggingface.co/mistralai/Mixtral-8x7B-v0.1/ for using mixtral
Add HF_TOKEN environment variable with the created token from step 2 as the value

Redeploy if required.

Apr 19 '24 00:04 hslee16

This is definitely a bug! Working on a fallback for folks without huggingface set up

Apr 19 '24 01:04 efriis

I just ran into this problem. hslee16's fix seems to work though!

Apr 20 '24 18:04 bootstrapM

@efriis the issue is still persisting. Is there any fix without specifying the HF_token? Thank you

Jul 19 '24 12:07 codebrain001

langchain langchain copied to clipboard

langchain-mistralai cannot pull tokenizer from huggingface 401

Checked other resources

Example Code

Code:

Error Message and Stack Trace (if applicable)

Description

System Info

langchain
langchain copied to clipboard