langchain icon indicating copy to clipboard operation
langchain copied to clipboard

`RetryOutputParser` error when used with `PydanticOutputParser`

Open lucasbasquerotto opened this issue 1 year ago • 2 comments

Checked other resources

  • [X] I added a very descriptive title to this issue.
  • [X] I searched the LangChain documentation with the integrated search.
  • [X] I used the GitHub search to find a similar question and didn't find it.
  • [X] I am sure that this is a bug in LangChain rather than my code.
  • [X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from langchain.output_parsers import RetryOutputParser
from langchain_core.output_parsers.pydantic import PydanticOutputParser
from langchain_core.pydantic_v1 import BaseModel
from langchain_openai import OpenAI

class TestModel(BaseModel):
    a: int
    b: str

data_pydantic = TestModel(a=1, b="2")

data_json = data_pydantic.json()

parser = PydanticOutputParser(pydantic_object=TestModel)

retry_parser = RetryOutputParser.from_llm(parser=parser, llm=OpenAI(temperature=0))

retry_parser.parse_with_prompt(completion=data_json, prompt_value="Test prompt")

retry_parser.parse_with_prompt(completion=data_pydantic, prompt_value="Test prompt") # Error

Error Message and Stack Trace (if applicable)

ValidationError                           Traceback (most recent call last)
Cell In[3], [line 20](vscode-notebook-cell:?execution_count=3&line=20)
     [16](vscode-notebook-cell:?execution_count=3&line=16) retry_parser = RetryOutputParser.from_llm(parser=parser, llm=OpenAI(temperature=0))
     [18](vscode-notebook-cell:?execution_count=3&line=18) retry_parser.parse_with_prompt(completion=data_json, prompt_value="Test prompt")
---> [20](vscode-notebook-cell:?execution_count=3&line=20) retry_parser.parse_with_prompt(completion=data_pydantic, prompt_value="Test prompt")

File [c:\Users\Asus\anaconda3\envs\dev\Lib\site-packages\langchain\output_parsers\retry.py:89](file:///C:/Users/Asus/anaconda3/envs/dev/Lib/site-packages/langchain/output_parsers/retry.py:89), in RetryOutputParser.parse_with_prompt(self, completion, prompt_value)
     [87](file:///C:/Users/Asus/anaconda3/envs/dev/Lib/site-packages/langchain/output_parsers/retry.py:87) while retries <= self.max_retries:
     [88](file:///C:/Users/Asus/anaconda3/envs/dev/Lib/site-packages/langchain/output_parsers/retry.py:88)     try:
---> [89](file:///C:/Users/Asus/anaconda3/envs/dev/Lib/site-packages/langchain/output_parsers/retry.py:89)         return self.parser.parse(completion)
     [90](file:///C:/Users/Asus/anaconda3/envs/dev/Lib/site-packages/langchain/output_parsers/retry.py:90)     except OutputParserException as e:
     [91](file:///C:/Users/Asus/anaconda3/envs/dev/Lib/site-packages/langchain/output_parsers/retry.py:91)         if retries == self.max_retries:

File [c:\Users\Asus\anaconda3\envs\dev\Lib\site-packages\langchain_core\output_parsers\json.py:218](file:///C:/Users/Asus/anaconda3/envs/dev/Lib/site-packages/langchain_core/output_parsers/json.py:218), in JsonOutputParser.parse(self, text)
    [217](file:///C:/Users/Asus/anaconda3/envs/dev/Lib/site-packages/langchain_core/output_parsers/json.py:217) def parse(self, text: str) -> Any:
--> [218](file:///C:/Users/Asus/anaconda3/envs/dev/Lib/site-packages/langchain_core/output_parsers/json.py:218)     return self.parse_result([Generation(text=text)])

File [c:\Users\Asus\anaconda3\envs\dev\Lib\site-packages\langchain_core\load\serializable.py:120](file:///C:/Users/Asus/anaconda3/envs/dev/Lib/site-packages/langchain_core/load/serializable.py:120), in Serializable.__init__(self, **kwargs)
    [119](file:///C:/Users/Asus/anaconda3/envs/dev/Lib/site-packages/langchain_core/load/serializable.py:119) def __init__(self, **kwargs: Any) -> None:
--> [120](file:///C:/Users/Asus/anaconda3/envs/dev/Lib/site-packages/langchain_core/load/serializable.py:120)     super().__init__(**kwargs)
    [121](file:///C:/Users/Asus/anaconda3/envs/dev/Lib/site-packages/langchain_core/load/serializable.py:121)     self._lc_kwargs = kwargs

File [c:\Users\Asus\anaconda3\envs\dev\Lib\site-packages\pydantic\main.py:341](file:///C:/Users/Asus/anaconda3/envs/dev/Lib/site-packages/pydantic/main.py:341), in pydantic.main.BaseModel.__init__()

ValidationError: 1 validation error for Generation
text
  str type expected (type=type_error.str)

Description

The RetryOutputParser does not seem to work correctly when used with PydanticOutputParser. I guess it won't work correctly whenever used with a parser that does not output a string.

In the code above, it works when receiving a string, but when receiving anything else, it throws:

ValidationError: 1 validation error for Generation
text
  str type expected (type=type_error.str)

In the context of a chain with a PydanticOutputParser, when the llm call returns a correct result as the pydantic model, the retry parser throws an error.

I see no mention about it (the RetryOutputParser only accepting a string) in the docs: https://python.langchain.com/docs/modules/model_io/output_parsers/types/retry

I was able to avoid this issue by converting the completion value to a json string (shown below), if the type is the same as the expected pydantic model.

def parse_with_prompt(args):
    completion = args['completion']

    if (type(completion) is TestModel):
        args = args.copy()
        del args['completion']
        completion = completion.json(ensure_ascii=False)
        args['completion'] = completion

    return retry_parser.parse_with_prompt(**args)

chain = RunnableParallel(
    completion=completion_chain, prompt_value=prompt
) | RunnableLambda(parse_with_prompt)

The problem is that this seems hackish, and I don't know if this will be portable in new versions of the parser (at least, in the example in the docs, I see no reference to the params that should be passed to parse_with_prompt, although I can see in the source code that they are completion: str and prompt_value: PromptValue, but I'm not sure if this should be considered an implementation detail, considering that there is no mention in the docs). Furthermore, if this issue is fixed in new versions, I may end up converting the model to json when I shouldn't.

For now I'm not using the RetryOutputParser, because it seems to not be production ready yet (at least with a parser that does not output a string).

System Info

System Information
------------------
> OS:  Windows
> OS Version:  10.0.22621
> Python Version:  3.11.5 | packaged by conda-forge | (main, Aug 27 2023, 03:23:48) [MSC v.1936 64 bit (AMD64)]

Package Information
-------------------
> langchain_core: 0.1.28
> langchain: 0.1.6
> langchain_community: 0.0.19
> langsmith: 0.1.14
> langchain_openai: 0.0.8
> langchainhub: 0.1.14
> langgraph: 0.0.28
> langserve: 0.0.46

lucasbasquerotto avatar Mar 15 '24 16:03 lucasbasquerotto

This is doc here, Is there a better solution? https://api.python.langchain.com/en/latest/output_parsers/langchain.output_parsers.retry.RetryOutputParser.html#langchain.output_parsers.retry.RetryOutputParser

liugddx avatar Apr 14 '24 04:04 liugddx

@liugddx It seems that the docs provides the args completion: str, prompt_value: [PromptValue], so the code above could be considered ok (as long as this doesn't change in the future).

But the fact that the RetryOutputParser throws an error when the main chain runs successfully seems to be an issue in itself.

I would expect it to work like a try...catch block, where the catch is run only when an error happens in the try block. When the main chain runs successfully, RetryOutputParser should do nothing and just return the correct result.

Furthermore, even if the code above works, if in the future the RetryOutputParser is fixed to accept a non string completion, and not just a str, I would be doing an uneeded conversion to json, just so it converts back (tough I don't think it would cause a performance problem).

lucasbasquerotto avatar Apr 15 '24 14:04 lucasbasquerotto