langchain icon indicating copy to clipboard operation
langchain copied to clipboard

ValueError: invalid literal for int() with base 10: '0<|im_end|>' use it will throw an exception

Open zhaoxin-jia-tfs opened this issue 2 years ago • 2 comments

System Info

macos

Who can help?

No response

Information

  • [ ] The official example notebooks/scripts
  • [ ] My own modified scripts

Related Components

  • [x] LLMs/Chat Models
  • [ ] Embedding Models
  • [ ] Prompts / Prompt Templates / Prompt Selectors
  • [x] Output Parsers
  • [ ] Document Loaders
  • [ ] Vector Stores / Retrievers
  • [ ] Memory
  • [ ] Agents / Agent Executors
  • [ ] Tools / Toolkits
  • [ ] Chains
  • [ ] Callbacks/Tracing
  • [ ] Async

Reproduction

llm = AzureOpenAI(deployment_name=deployment, model_name="text-davinci-003", temperature=0, max_tokens=500) llm_chain = load_qa_chain(llm, verbose=True,chain_type="map_rerank")

and when i run ch = llm_chain.run(input_documents=context, question=question)

it will throw an exception zip(typed_results, docs), key=lambda x: -int(x[0][self.rank_key]) ValueError: invalid literal for int() with base 10: '0<|im_end|>'

Expected behavior

this is my llm function

llm = AzureOpenAI(deployment_name=deployment, model_name="text-davinci-003", temperature=0, max_tokens=500) llm_chain = load_qa_chain(llm, verbose=True,chain_type="map_rerank")

and when i run ch = llm_chain.run(input_documents=context, question=question)

it will throw an exception zip(typed_results, docs), key=lambda x: -int(x[0][self.rank_key]) ValueError: invalid literal for int() with base 10: '0<|im_end|>'

and when i change the chain_type ="stuff" it worked. but currently, i want to use map_rerank. any one can help me ?

zhaoxin-jia-tfs avatar Jun 15 '23 08:06 zhaoxin-jia-tfs

Generated by a 🤖

Answer

The issue seems to be related to the parsing of the output from the language model. The output contains a special token '

Sources


This response is meant to be helpful and save users time. It is not meant to be a precise solution.

Please leave a 👍 if this is helpful and 👎 if it is irrelevant

devstein avatar Jun 16 '23 01:06 devstein

this tag is responsed by AzureOpenAI. this is the document of Auzre https://learn.microsoft.com/en-us/azure/cognitive-services/openai/chatgpt-quickstart?tabs=command-line&pivots=programming-language-studio image

it happened we used map_rerank. i think this is a common issue.

zhaoxin-jia-tfs avatar Jun 16 '23 02:06 zhaoxin-jia-tfs

@zhaoxin-jia-tfs @devstein any update on this?

pradeepdev-1995 avatar Jul 18 '23 13:07 pradeepdev-1995

I'm also experiencing the same issue when I use other chain types, besides 'stuff'. Again, only happens with non-stuff chain types. It would be helpful if we get some attention on this, as it appears to be a common issue. @hwchase17 @eyurtsev @nfcampos


File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/langchain/chains/combine_documents/map_rerank.py:192, in MapRerankDocumentsChain._process_results..(x) 185 def _process_results( 186 self, 187 docs: List[Document], 188 results: Sequence[Union[str, List[str], Dict[str, str]]], 189 ) -> Tuple[str, dict]: 190 typed_results = cast(List[dict], results) 191 sorted_res = sorted( --> 192 zip(typed_results, docs), key=lambda x: -int(x[0][self.rank_key]) 193 ) 194 output, document = sorted_res[0] 195 extra_info = {}

ValueError: invalid literal for int() with base 10:

VivekLokesh avatar Aug 12 '23 01:08 VivekLokesh

For some days now I've been trying to solve this issue when using map_rerank. I can't retrieve the confidence score for the answer. It looks like it's trying to convert the string to a integer, but I get an error. I couldn't find a way to give me the pure string.

So I tried to use RegexParser with a custom prompt as shown in the documentation, but I still get the same error. Here is the example:

from langchain.output_parsers import RegexParser

output_parser = RegexParser( regex=r"(.?)\nScore: (.)", output_keys=["answer", "score"], )

prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

In addition to giving an answer, also return a score of how fully it answered the user's question. This should be in the following format:

Question: [question here] Helpful Answer In Italian: [answer here] Score: [score between 0 and 100]

Begin!

Context:

{context}

Question: {question} Helpful Answer In Italian:"""

PROMPT = PromptTemplate( template=prompt_template, input_variables=["context", "question"], output_parser=output_parser, )

chain = load_qa_chain(OpenAI(temperature=0), chain_type="map_rerank", return_intermediate_steps=True, prompt=PROMPT) query = "What did the president say about Justice Breyer" chain({"input_documents": docs, "question": query}, return_only_outputs=True)

https://python.langchain.com/docs/use_cases/question_answering/how_to/question_answering

ggnicolau avatar Sep 05 '23 22:09 ggnicolau

Hi, @zhaoxin-jia-tfs,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. The issue you raised involves a ValueError when using the "map_rerank" chain_type in the AzureOpenAI function. Other users have also reported experiencing the same issue, and there has been some discussion around a potential explanation related to the parsing of the output from the language model. As of now, the issue remains unresolved and is awaiting further updates from the maintainers.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, kindly let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and cooperation.

dosubot[bot] avatar Dec 06 '23 17:12 dosubot[bot]