langchain
langchain copied to clipboard
HuggingFaceEndpoint returning buggy responses and prompt template back
Checked other resources
- [X] I added a very descriptive title to this issue.
- [X] I searched the LangChain documentation with the integrated search.
- [X] I used the GitHub search to find a similar question and didn't find it.
- [X] I am sure that this is a bug in LangChain rather than my code.
- [X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
Example Code
llm = HuggingFaceEndpoint(
repo_id= 'meta-llama/Llama-3.1-8B-Instruct',
huggingfacehub_api_token=api_token
)
llm_chain = (
{
"context": lambda inputs: retrieve_context(inputs['question'], inputs['vector']),
"question": RunnablePassthrough()
}
| PromptTemplate(
template=BASE_TEMPLATE,
input_variables=["context", "question"]
)
| llm
| StrOutputParser()
)
llm_chain.invoke({ 'question': user_query, 'vector': vector })
Error Message and Stack Trace (if applicable)
As you can see in langsmith, it returned this output.
Description
I'm using HuggingFaceEndpoint for inference to avoid storing model on my local machine, and I've noticed it gives buggy responses quite a few times. I'm using it for a RAG and a lot of times it just returns back the entire base prompt template inside [INST]...[/INST]. And as seen in screenshot attached, it returned "[/INST]" in a loop until max tokens limit reached.
System Info
System Information
OS: Darwin OS Version: Darwin Kernel Version 22.4.0: Mon Mar 6 21:00:17 PST 2023; root:xnu-8796.101.5~3/RELEASE_X86_64 Python Version: 3.12.6 (main, Sep 6 2024, 19:03:47) [Clang 15.0.0 (clang-1500.1.0.2.5)]
Package Information
langchain_core: 0.3.19 langchain: 0.3.7 langchain_community: 0.3.7 langsmith: 0.1.143 langchain_huggingface: 0.1.2 langchain_text_splitters: 0.3.2 langchainhub: 0.1.21
Optional packages not installed
langgraph langserve
Other Dependencies
aiohttp: 3.11.4 async-timeout: Installed. No version info available. dataclasses-json: 0.6.7 httpx: 0.27.2 httpx-sse: 0.4.0 huggingface-hub: 0.26.2 jsonpatch: 1.33 numpy: 1.26.4 orjson: 3.10.11 packaging: 24.2 pydantic: 2.9.2 pydantic-settings: 2.6.1 PyYAML: 6.0.2 requests: 2.32.3 requests-toolbelt: 1.0.0 sentence-transformers: 3.3.1 SQLAlchemy: 2.0.35 tenacity: 9.0.0 tokenizers: 0.20.3 transformers: 4.46.3 types-requests: 2.32.0.20241016 typing-extensions: 4.12.2