openai-python icon indicating copy to clipboard operation
openai-python copied to clipboard

beta.chat.completions.parse returns unhandled ValidationError

Open marinomaria opened this issue 1 year ago • 19 comments

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • [X] This is an issue with the Python library

Describe the bug

In some occasions while using the Completion API with Structured Outputs, the SDK fails and returns a ValidationError:

ValidationError: 1 validation error for RawResponse
  Invalid JSON: EOF while parsing a value at line 1 column 600 [type=json_invalid, input_value='                        ...                       ', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/json_invalid

This does not happen every time, but we use it in a production service and this unpredictable behavior is hard to prevent.

To Reproduce

  1. Create a Pydantic model
  2. Instantiate an OpenAI client
  3. Use the method OpenAI.beta.chat.completions.parse(...) with the following arguments
  4. Repeat a few times for seeing the error
from pydantic import BaseModel
from openai import OpenAI

class RawResponse(BaseModel):
    answer: str

client = OpenAI(api_key=...)
completion = client.beta.chat.completions.parse(
                        model='gpt-4o-2024-08-06',
                        messages=messages,
                        max_tokens=750,
                        n=1,
                        stop=None,
                        temperature=0.1,
                        response_format=RawResponse
                    )

After a few times, this fails with:

ValidationError: 1 validation error for RawResponse
  Invalid JSON: EOF while parsing a value at line 1 column 600 [type=json_invalid, input_value='                        ...                       ', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/json_invalid

Code snippets

No response

OS

debian:bullseye-slim

Python version

CPython 3.10.8

Library version

openai 1.48.0

marinomaria avatar Sep 30 '24 15:09 marinomaria

Thanks for the report, it looks like your example script isn't fully complete, could you share a full script?

RobertCraigie avatar Sep 30 '24 15:09 RobertCraigie

Hi, thanks for the quick reply! Sadly I can't provide a full script for privacy reasons but I'll be happy to share any information you need for identifying the issue. Here's the traceback:

File "/app/src/core/modules/emiGPT/core/openai_chat_api.py", line 95, in _get_response	
  completion = self._client.beta.chat.completions.parse(	
File "/opt/venv/lib/python3.10/site-packages/openai/resources/beta/chat/completions.py", line 145, in parse	
  return _parse_chat_completion(	
File "/opt/venv/lib/python3.10/site-packages/openai/lib/_parsing/_completions.py", line 110, in parse_chat_completion	
  "parsed": maybe_parse_content(	
File "/opt/venv/lib/python3.10/site-packages/openai/lib/_parsing/_completions.py", line 161, in maybe_parse_content	
  return _parse_content(response_format, message.content)	
File "/opt/venv/lib/python3.10/site-packages/openai/lib/_parsing/_completions.py", line 221, in _parse_content	
  return cast(ResponseFormatT, model_parse_json(response_format, content))	
File "/opt/venv/lib/python3.10/site-packages/openai/_compat.py", line 166, in model_parse_json	
  return model.model_validate_json(data)	
File "/opt/venv/lib/python3.10/site-packages/pydantic/main.py", line 625, in model_validate_json	
  return cls.__pydantic_validator__.validate_json(json_data, strict=strict, context=context)	
pydantic_core._pydantic_core.ValidationError: 1 validation error for RawResponse	
  Invalid JSON: EOF while parsing a value at line 1 column 600 [type=json_invalid, input_value='                        ...                       ', input_type=str]	
    For further information visit https://errors.pydantic.dev/2.9/v/json_invalid

Please let me know if there's anything else you need.

marinomaria avatar Sep 30 '24 15:09 marinomaria

Could you share a request ID from a failing request? https://github.com/openai/openai-python#request-ids

RobertCraigie avatar Sep 30 '24 15:09 RobertCraigie

From seeing the logs of our application I understand the call to client.beta.chat.completions.parse(...) resulted in an exception, thus it gave no result from which to extract a request_id. 😞

marinomaria avatar Sep 30 '24 16:09 marinomaria

ahhhh right sorry, if you don't already have debug logging enabled, could you enable it? https://github.com/openai/openai-python#logging that should show a request ID in the logs

RobertCraigie avatar Sep 30 '24 16:09 RobertCraigie

Sure thing!

marinomaria avatar Sep 30 '24 18:09 marinomaria

Hi there! Any updates here? FYI - same thing happening for me, probably 50% of the time:

openai==1.53.0
pydantic==2.9.2
pydantic_core==2.23.4

I've solved this in the meantime with a tenacity retry, but it's adding latency and calls which isn't ideal...

try:
    @retry(
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=0, min=0, max=0),  # No wait between retries
        retry=retry_if_exception_type(ValidationError),
        before=lambda retry_state: messages.append({
            "role": "system",
            "content": "Raised Exception: pydantic_core._pydantic_core.ValidationError. Please try again and confirm to model specs."
        })
    )
    async def attempt_parse():
        return await async_client.beta.chat.completions.parse(
            model=model,
            messages=messages,
            response_format=MyPydanticModel,
            functions=FUNCTION_LIST,
            function_call="auto",
        )

    response = await attempt_parse()

except ValidationError as e:
    logger.error(f"Failed to parse response after all retries: {e}")
    raise

jonomillin avatar Nov 13 '24 19:11 jonomillin

I'm experiencing the same issue.

I'm considering consuming the API directly myself as this is a bit of a pain.

Is there an intention to address this bug?

Tom-OCRT avatar Dec 04 '24 00:12 Tom-OCRT

How complex are the models you're putting together? I wonder if it's just a token limit or something?

e.g. here's a simple Pydantic model from a test script I was playing around with:

class UserInformation(BaseModel):
    name: Optional[str] = Field(description="Name of the user")
    email: Optional[str] = Field(description="Email address of the user")
    phone: Optional[str] = Field(description="Phone number of the user. Store this in the format '+1 123-456-7890'")
    title_role: Optional[str] = Field(description="The Title or Role of the user at their company")
    company_name: Optional[str] = Field(description="Name of the company the user works for")

which explodes into:

{
  "properties": {
    "name": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "description": "Name of the user",
      "title": "Name"
    },
    "email": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "description": "Email address of the user",
      "title": "Email"
    },
    "phone": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "description": "Phone number of the user. Store this in the format +1 123-456-7890",
      "title": "Phone"
    },
    "title_role": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "description": "The Title or Role of the user at their company",
      "title": "Title Role"
    },
    "company_name": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "description": "Name of the company the user works for",
      "title": "Company Name"
    }
  },
  "required": [
    "name",
    "email",
    "phone",
    "title_role",
    "company_name"
  ],
  "title": "UserInformation",
  "type": "object"
}

A fairly complex model feels like it would generate quite a few tokens which might get missed/misinterpreted. I wonder if a simplified json blob for the response_format would help things?

jonomillin avatar Dec 04 '24 00:12 jonomillin

My schema is only slightly more complex than your example, very similar, but wrapped in an array allowing the model to return multiple entries for each prompt. Output token lengths are fairly reasonable, but I assume there must be pathological cases.

Tom-OCRT avatar Dec 04 '24 00:12 Tom-OCRT

Did anybody find a fix to this issue? This issue only recently appear for me.

DeterjoSimon avatar Jan 16 '25 10:01 DeterjoSimon

Thanks @RobertCraigie for merging the PR! @DeterjoSimon just pushed the fix, should be available in next release!

rjoshi avatar Jan 16 '25 16:01 rjoshi

@rjoshi I don't think your PR would solve all the issues encountered, as the original report includes this in the error message

line 1 column 600

which means the content was non-empty

RobertCraigie avatar Jan 16 '25 16:01 RobertCraigie

Thanks @RobertCraigie , I see - I have not yet run into this issue but will keep an eye out for this. We at Starspark.AI are betting heavily on structured response for enabling our product scenario.

rjoshi avatar Jan 16 '25 16:01 rjoshi

We experience the same issue but with different model gpt-4o-mini-2024-07-18 deployed onto Azure OpenAI (Sweden Central).

Some diagnostic info:

# python -V
Python 3.11.8

# pip list | grep openai
langchain-openai                        0.3.0
openai                                  1.59.7

Here is our structure output model:

from typing import List, TypeAlias, Union

from langchain_core.documents.base import Document
from pydantic import BaseModel, Field

# Type aliases for improved readability
CitationList: TypeAlias = List["Citation"]
DocumentList: TypeAlias = List[Document]

class Citation(BaseModel):
    """
    Citation from a specific document that justifies an answer.
    Note:
    A chat model must include these document attributes in the context of the
    prompt to return a structured output.

    Attributes:
        document_id (int): The integer ID of a specific document which justifies
        the answer.
        quote (str): The verbatim quote from the specified source that justifies
        the answer.
        title (str): The title of the document that contains the quote.
        metadata_storage_name (str): The filename of the document, including its
        extension, in the storage system.
        metadata_storage_path (str): The path to the document in the storage
        system.
        source_url (str): The URL of the source document that contains
        the quote.
    """

    document_id: int = Field(
        ...,
        description=(
            "The integer ID of a SPECIFIC document which justifies the answer."
        ),
    )
    quote: str = Field(
        ...,
        description=(
            "The VERBATIM quote from the specified source that justifies the answer."
        ),
    )
    title: str = Field(
        ...,
        description="The title of the document that contains the quote.",
    )
    metadata_storage_name: str = Field(
        ...,
        description=(
            "The filename of the document, including its extension, in the storage"
            " system."
        ),
    )
    source_url: str | None = Field(
        None,
        description="The URL of the source document that contains the quote.",
    )

class QuotedAnswer(BaseModel):
    """Answer the user question based only on the given sources, and cite the
    sources used."""

    answer: str = Field(
        ...,
        description=(
            "The answer to the user question, which is based only on the given"
            " sources."
        ),
    )
    citations: CitationList = Field(
        ...,
        description="Citations from the given sources that justify the answer.",
    )


class ConversationalResponse(BaseModel):
    """Respond in a conversational manner. Be kind and helpful."""

    response: str = Field(description="A conversational response to the user's query")


class FinalResponse(BaseModel):
    """Final response containing either quoted or conversational answer."""

    final_output: Union[QuotedAnswer, ConversationalResponse]

ms-86 avatar Jan 17 '25 08:01 ms-86

@ms-86 How do you generate the parsing/error? Does it appear on a specific class? I'm guessing you are attaching the pydantic schema to a langchain LLM, like: llm.with_structured_output(schema=FinalResponse). Something similar to this?

DeterjoSimon avatar Jan 17 '25 12:01 DeterjoSimon

@DeterjoSimon You're correct. But since we've been experiencing a lot of hallucinations in our responses (not matching a schema we defined) we decided to include raw answer and in case of parsing error do it on our own using some heuristic:

chat_model.with_structured_output(
    schema=FinalResponse, include_raw=True
)

ms-86 avatar Jan 17 '25 13:01 ms-86

Today I have found in our telemetry something which could help finding a root cause of this bug. It seems that the input for parsing an answer into Pydantic schema is just full of 0x0a (line feed characters)

{
  "errors": [
    {
      "type": "json_invalid",
      "loc": "()",
      "msg": "Invalid JSON: EOF while parsing a value at line 1804 column 0",
      "input": "\n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n \n   \n\n",
      "ctx": { "error": "EOF while parsing a value at line 1804 column 0" },
      "url": "https://errors.pydantic.dev/2.10/v/json_invalid"
    }
  ],
  "title": "FinalResponse",
  "json": "[{\"type\":\"json_invalid\",\"loc\":[],\"msg\":\"Invalid JSON: EOF while parsing a value at line 1804 column 0\",\"input\":\"\\n   \\n (more newline chars..."
}

ms-86 avatar Jan 22 '25 09:01 ms-86

Is this issue fixed? I'm having it and I don't know if it's because i'm using OpenAI from Azure or something like that

valenradovich avatar Mar 18 '25 13:03 valenradovich