generative-ai-python Broken OpenAI composite Pydantic structured-output

Description of the bug:

Here's MRE:

# /// script
# requires-python = ">=3.10"
# dependencies = [
#     "google-generativeai==0.8.3",
#     "openai==1.55.3",
#     "pydantic==2.10.2",
# ]
# ///
import os
from typing import Annotated
from openai import OpenAI
from pydantic import BaseModel, Field


client = OpenAI(
    api_key=os.environ["GOOGLE_AI_API_KEY"],
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
)

class FooWorking(BaseModel):
    foo: Annotated[str, Field()]

class Bar(BaseModel):
    bar: Annotated[str, Field()]

class FooNotWorking(BaseModel):
    foo: Annotated[Bar, Field()]

# Will work
client.beta.chat.completions.parse(
    model="gemini-1.5-flash",
    messages=[
        {"role": "user", "content": "What is bar?"},
    ],
    response_format=FooWorking,
)
print("First call worked!")

# Will NOT work!
client.beta.chat.completions.parse(
    model="gemini-1.5-flash",
    messages=[
        {"role": "user", "content": "What is bar?"},
    ],
    response_format=FooNotWorking,
)

Both calls work just fine on the OpenAI API. Just a hypothesis, but maybe handling of $defs is broken/unsupported on Google side. Another thing completely is that the vanilla 400 error message is not informative/helpful:

openai.BadRequestError: Error code: 400 - [{'error': {'code': 400, 'message': 'Request contains an invalid argument.', 'status': 'INVALID_ARGUMENT'}}]

Actual vs expected behavior:

2nd call should work. If this is a limitation, should that be documented in the docs:

https://ai.google.dev/gemini-api/docs/structured-output?lang=python
https://ai.google.dev/gemini-api/docs/openai

Any other information you'd like to share?

No response

Nov 29 '24 04:11 ravwojdyla

still having this issue

Dec 16 '24 23:12 andrew-stelmach

This doesn't work for me either. I have a simpler case:

{
  "model": "gemini-1.5-flash",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Extract first book r...umane way of living."
    }
  ],
  "temperature": 0.0,
  "max_tokens": 8192,
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "BookReference",
      "schema": {
        "properties": {
          "title": {
            "title": "Title",
            "type": "string"
          },
          "author": {
            "anyOf": [
              {
                "type": "string"
              },
              {
                "type": "null"
              }
            ],
            "title": "Author"
          }
        },
        "required": [
          "title",
          "author"
        ],
        "title": "BookReference",
        "type": "object",
        "additionalProperties": false
      }
    }
  }
}

and I get the extremely unhelpful error: INVALID_ARGUMENT which doesn't tell me what the invalid argument is!

Tool use doesn't work either because it returns an invalid toolCalls response in the ChatCompletion vs. the valid tool_calls response.

This feels like the openai compat was rushed to ship without any validation of whether non pure text scenarios actually work. For me, structured outputs, tool use, and images are all broken in the OpenAI layer.

I was really hoping to have this as a drop-in replacement, but it doesn't seem to be the case.

Dec 23 '24 21:12 jflam

It seems like it the API lacks support for certain typing classes like Annotated, Union, Optional. I got it working withclass Location(BaseModel): city: str = Field(None, description="City of the event")

Dec 28 '24 17:12 rostandk