langchain
langchain copied to clipboard
langchain-mistralai cannot pull tokenizer from huggingface 401
Checked other resources
- [X] I added a very descriptive title to this issue.
- [X] I searched the LangChain documentation with the integrated search.
- [X] I used the GitHub search to find a similar question and didn't find it.
- [X] I am sure that this is a bug in LangChain rather than my code.
- [X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
Example Code
Code:
from langchain_mistralai import MistralAIEmbeddings
import assistant.settings as settings
def getMistralEmbeddings():
return MistralAIEmbeddings(mistral_api_key=settings.MISTRAL_API_KEY) #well defined variable from env, works on my personnal machine at the time i'm publishing the issue
Error Message and Stack Trace (if applicable)
Traceback (most recent call last):
File "/app/assistant_api.py", line 37, in
Cannot access gated repo for url https://huggingface.co/mistralai/Mixtral-8x7B-v0.1/resolve/main/tokenizer.json. Repo model mistralai/Mixtral-8x7B-v0.1 is gated. You must be authenticated to access it. Traceback (most recent call last):
Description
This error, and this stackTrace occur when deployed on a kubernetes server since today afternoon. It's seems to me it's a bug because i cannot recreate the error on my personnal machine, even when i deleted the virtual environment, and the pycaches folders, and then reinstalled everything from the requirements.txt
I know i should authenticate, but firstly, why, and secondly how ? i came across some solutions where you have to put your huggingface token inside the header of the request, but i don't really know where to inject a token like this when using langchain-mistralai
System Info
aiohttp>=3.9.3 aiosignal>=1.3.1 annotated-types>=0.6.0 anyio>=4.3.0 async-timeout>=4.0.3 attrs>=23.2.0 certifi>=2024.2.2 charset-normalizer>=3.3.2 click>=8.1.7 dataclasses-json>=0.6.4 exceptiongroup>=1.2.0 faiss-cpu>=1.8.0 fastapi>=0.110.1 filelock>=3.13.4 frozenlist>=1.4.1 fsspec>=2024.3.1 greenlet>=3.0.3 grpcio>=1.62.1 grpcio-tools>=1.62.1 h11>=0.14.0 h2>=4.1.0 hpack>=4.0.0 httpcore>=1.0.5 httpx>=0.25.2 httpx-sse>=0.4.0 huggingface-hub>=0.22.2 hyperframe>=6.0.1 idna>=3.6 Jinja2>=3.1.3 joblib>=1.4.0 jsonpatch>=1.33 jsonpointer>=2.4 langchain>=0.1.15 langchain-community>=0.0.32 langchain-core>=0.1.41 langchain-mistralai>=0.1.1 langchain-text-splitters>=0.0.1 langsmith>=0.1.43 MarkupSafe>=2.1.5 marshmallow>=3.21.1 mistralai>=0.1.8 mpmath>=1.3.0 multidict>=6.0.5 mypy-extensions>=1.0.0 networkx>=3.3 numpy>=1.26.4 nvidia-cublas-cu12>=12.1.3.1 nvidia-cuda-cupti-cu12>=12.1.105 nvidia-cuda-nvrtc-cu12>=12.1.105 nvidia-cuda-runtime-cu12>=12.1.105 nvidia-cudnn-cu12>=8.9.2.26 nvidia-cufft-cu12>=11.0.2.54 nvidia-curand-cu12>=10.3.2.106 nvidia-cusolver-cu12>=11.4.5.107 nvidia-cusparse-cu12>=12.1.0.106 nvidia-nccl-cu12>=2.19.3 nvidia-nvjitlink-cu12>=12.4.127 nvidia-nvtx-cu12>=12.1.105 orjson>=3.10.0 packaging>=23.2 pandas>=2.2.1 pillow>=10.3.0 portalocker>=2.8.2 protobuf>=4.25.3 pyarrow>=15.0.2 pydantic>=2.6.4 pydantic_core>=2.16.3 python-dateutil>=2.9.0.post0 python-dotenv>=1.0.1 pytz>=2024.1 PyYAML>=6.0.1 qdrant-client>=1.8.2 redis>=5.0.3 regex>=2023.12.25 requests>=2.31.0 safetensors>=0.4.2 scikit-learn>=1.4.2 scipy>=1.13.0 sentence-transformers>=2.6.1 six>=1.16.0 sniffio>=1.3.1 SQLAlchemy>=2.0.29 starlette>=0.37.2 sympy>=1.12 tenacity>=8.2.3 threadpoolctl>=3.4.0 tokenizers>=0.15.2 torch>=2.2.2 tqdm>=4.66.2 transformers>=4.39.3 triton>=2.2.0 typing-inspect>=0.9.0 typing_extensions>=4.11.0 tzdata>=2024.1 urllib3>=2.2.1 uvicorn>=0.29.0 yarl>=1.9.4
MISTRAL_API_KEY
here isn't related to Hugging Face Hub gate on mistralai/Mixtral-8x7B-v0.1
model. We also ran into this, and expected that if we set HUGGINGFACEHUB_API_TOKEN
like in LangChain Huggingface Endpoints doc with an API token of a user with access would fix this issue. Unfortunately, still broken.
yes, that's really strange
Closing this issue as langchain-mistralai is for the mistral API not for hugging face. If there's a separate issuewith hugging face endpoints feel free to open or :+1: on it
@eyurtsev Anyone using from langchain_mistralai.embeddings import MistralAIEmbeddings
will run into this issue when it first initializes and tries to download https://huggingface.co/mistralai/Mixtral-8x7B-v0.1/resolve/main/tokenizer.json
from Hugging Face.
Just ran into this issue. Anyone have a workaround for this?
For those who are facing this issue, I fixed it by doing the following:
- Create an account at huggingface.co if you don't already have one
- Create a new access token at - https://huggingface.co/settings/tokens
- Accept the "terms" at https://huggingface.co/mistralai/Mixtral-8x7B-v0.1/ for using mixtral
- Add
HF_TOKEN
environment variable with the created token from step 2 as the value
Redeploy if required.
This is definitely a bug! Working on a fallback for folks without huggingface set up
I just ran into this problem. hslee16's fix seems to work though!
@efriis the issue is still persisting. Is there any fix without specifying the HF_token? Thank you