outlines add json generation support for openai

add json generation support for openai

Some sample input/output of json generation looks like the following

In [3]: class User(BaseModel):
   ...:     name: str
   ...:     last_name: str
   ...:     id: int
   ...: 

In [4]: generator = generate.json(model, User)
   ...: result = generator(
   ...:     "Create a user profile with the fields name, last_name and id"
   ...: )

In [5]: result
Out[5]: User(name='John', last_name='Doe', id=12345)

Jul 22 '24 03:07 JerryKwan

I think this would result in a lot of user complaints about validation errors. Unlike local models which outlines supports, OpenAI doesn't guarantee the output will match the specified schema.

Jul 22 '24 15:07 lapp0

I think this would result in a lot of user complaints about validation errors. Unlike local models which outlines supports, OpenAI doesn't guarantee the output will match the specified schema.

I think it's fine as long as we return an informative error which states the problem is on OpenAI's side.

I was thinking we should use OpenAI's function calling for this feature.

Jul 22 '24 16:07 rlouf

@lapp0 @rlouf Thanks for the feedback. We can change to use function calling (tool_choice) But I have a question about OpenAI, as you can see from the document

response_format
object

Optional
An object specifying the format that the model must output. Compatible with [GPT-4 Turbo](https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo) and all GPT-3.5 Turbo models newer than gpt-3.5-turbo-1106.

Setting to { "type": "json_object" } enables JSON mode, which guarantees the message the model generates is valid JSON.

Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Also note that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.

if we set response_format to 'json_object', the response should be a valid JSON string, did you encountered the API returns not valid json string?

Jul 23 '24 00:07 JerryKwan

Yes it happens, but we can return a meaningful error when this happens.

Jul 25 '24 21:07 rlouf

@rlouf Thanks for the comments. I will change it to function calling, and raise necessary warning when received non valid json response.

Jul 25 '24 23:07 JerryKwan

@rlouf To use function calling, we may need to add tools and tool_choice to OpenAIConfig Please confirm if it's appropriate, thank you

Jul 26 '24 02:07 JerryKwan

@rlouf I have changed to function calling, would you please help to review? thank you

Aug 05 '24 01:08 JerryKwan

Hi! In light of the recent OpenAI Structured Outputs support, could this PR be changed as a "pass-through" solution to use OpenAI's APIs?

Aug 12 '24 14:08 acatovic

@acatovic Absolutely YES. But I think we should consider compatibility issues. According to the source codes listed https://github.com/openai/openai-python/blob/main/src/openai/lib/_tools.py#L39. The new native SDK support for structured output is using function calling internally, it's just a wrapper of function calling, and that feature only exists on openai version greater or equal to v1.40.0. I am afraid most of the users are not using these versions.

Aug 13 '24 01:08 JerryKwan

@rlouf Would you please give some suggestions about what should we do next? thank you.

Aug 14 '24 07:08 JerryKwan

Let's use OpenAI's structured output support, which should be easy! We can always raise an exception if the version of the SDK that is installed is < 1.40 and invite users to upgrade to the latest version.

Aug 14 '24 10:08 rlouf

@rlouf Bad news. We encountered the following erros

File /usr/local/lib/python3.12/dataclasses.py:1320, in asdict(obj, dict_factory)
   1318 if not _is_dataclass_instance(obj):
   1319     raise TypeError("asdict() should be called on dataclass instances")
-> 1320 return _asdict_inner(obj, dict_factory)

File /usr/local/lib/python3.12/dataclasses.py:1330, in _asdict_inner(obj, dict_factory)
   1326 elif _is_dataclass_instance(obj):
   1327     # fast path for the common case
   1328     if dict_factory is dict:
   1329         return {
-> 1330             f.name: _asdict_inner(getattr(obj, f.name), dict)
   1331             for f in fields(obj)
   1332         }
   1333     else:
   1334         result = []

File /usr/local/lib/python3.12/dataclasses.py:1364, in _asdict_inner(obj, dict_factory)
   1359     return type(obj)(*[_asdict_inner(v, dict_factory) for v in obj])
   1360 elif isinstance(obj, (list, tuple)):
   1361     # Assume we can create an object of this type by passing in a
   1362     # generator (which is not true for namedtuples, handled
   1363     # above).
-> 1364     return type(obj)(_asdict_inner(v, dict_factory) for v in obj)
   1365 elif isinstance(obj, dict):
   1366     if hasattr(type(obj), 'default_factory'):
   1367         # obj is a defaultdict, which has a different constructor from
   1368         # dict as it requires the default_factory as its first arg.

File /usr/local/lib/python3.12/dataclasses.py:1364, in <genexpr>(.0)
   1359     return type(obj)(*[_asdict_inner(v, dict_factory) for v in obj])
   1360 elif isinstance(obj, (list, tuple)):
   1361     # Assume we can create an object of this type by passing in a
   1362     # generator (which is not true for namedtuples, handled
   1363     # above).
-> 1364     return type(obj)(_asdict_inner(v, dict_factory) for v in obj)
   1365 elif isinstance(obj, dict):
   1366     if hasattr(type(obj), 'default_factory'):
   1367         # obj is a defaultdict, which has a different constructor from
   1368         # dict as it requires the default_factory as its first arg.

File /usr/local/lib/python3.12/dataclasses.py:1373, in _asdict_inner(obj, dict_factory)
   1371             result[_asdict_inner(k, dict_factory)] = _asdict_inner(v, dict_factory)
   1372         return result
-> 1373     return type(obj)((_asdict_inner(k, dict_factory),
   1374                       _asdict_inner(v, dict_factory))
   1375                      for k, v in obj.items())
   1376 else:
   1377     return copy.deepcopy(obj)

File /usr/local/lib/python3.12/dataclasses.py:1374, in <genexpr>(.0)
   1371             result[_asdict_inner(k, dict_factory)] = _asdict_inner(v, dict_factory)
   1372         return result
   1373     return type(obj)((_asdict_inner(k, dict_factory),
-> 1374                       _asdict_inner(v, dict_factory))
   1375                      for k, v in obj.items())
   1376 else:
   1377     return copy.deepcopy(obj)

File /usr/local/lib/python3.12/dataclasses.py:1373, in _asdict_inner(obj, dict_factory)
   1371             result[_asdict_inner(k, dict_factory)] = _asdict_inner(v, dict_factory)
   1372         return result
-> 1373     return type(obj)((_asdict_inner(k, dict_factory),
   1374                       _asdict_inner(v, dict_factory))
   1375                      for k, v in obj.items())
   1376 else:
   1377     return copy.deepcopy(obj)

TypeError: PydanticFunctionTool.__init__() missing 1 required positional argument: 'model'

and

File ~/test/outlines/outlines/caching.py:104, in cache.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
    101     return await cached_function(*args, **kwargs)
    103 cache_key = wrapper.__cache_key__(*args, **kwargs)
--> 104 result = wrapper.__memory__.get(cache_key, default=ENOVAL, retry=True)
    106 if result is ENOVAL:
    107     result = await cached_function(*args, **kwargs)

File /usr/local/lib/python3.12/site-packages/diskcache/core.py:1149, in Cache.get(self, key, default, read, expire_time, tag, retry)
   1123 def get(
   1124     self,
   1125     key,
   (...)
   1130     retry=False,
   1131 ):
   1132     """Retrieve value from cache. If `key` is missing, return `default`.
   1133 
   1134     Raises :exc:`Timeout` error when database timeout occurs and `retry` is
   (...)
   1147 
   1148     """
-> 1149     db_key, raw = self._disk.put(key)
   1150     update_column = EVICTION_POLICY[self.eviction_policy]['get']
   1151     select = (
   1152         'SELECT rowid, expire_time, tag, mode, filename, value'
   1153         ' FROM Cache WHERE key = ? AND raw = ?'
   1154         ' AND (expire_time IS NULL OR expire_time > ?)'
   1155     )

File ~/test/outlines/outlines/caching.py:20, in CloudpickleDisk.put(self, key)
     19 def put(self, key):
---> 20     data = cloudpickle.dumps(key)
     21     return super().put(data)

File /usr/local/lib/python3.12/site-packages/cloudpickle/cloudpickle.py:1479, in dumps(obj, protocol, buffer_callback)
   1477 with io.BytesIO() as file:
   1478     cp = Pickler(file, protocol=protocol, buffer_callback=buffer_callback)
-> 1479     cp.dump(obj)
   1480     return file.getvalue()

File /usr/local/lib/python3.12/site-packages/cloudpickle/cloudpickle.py:1245, in Pickler.dump(self, obj)
   1243 def dump(self, obj):
   1244     try:
-> 1245         return super().dump(obj)
   1246     except RuntimeError as e:
   1247         if len(e.args) > 0 and "recursion" in e.args[0]:

TypeError: cannot pickle 'sqlite3.Connection' object

It seems like there are some conflicts between PydanticFunctionTool and outlines.base.vectorize and from outlines.caching.cache

BTW, the native structured output feature only exists in beta module like client.beta.chat.completions.parse

Aug 15 '24 05:08 JerryKwan

Seems like the error "cannot pickle 'sqlite3.Connection' object" relates to https://github.com/pydantic/pydantic/issues/8232

Aug 15 '24 09:08 JerryKwan

Since OpenAI uses JSON Schema under the hood we can always directly use the to_strict_json_schema function to convert the Pydantic model in a JSON Schema that is compatible with the OpenAI API, send this schema to the API and parse the response. That's a little more "manual" but not overwhelmingly so.

Aug 15 '24 11:08 rlouf

@rlouf We can get the json schema using functionto_strict_json_schema but the problem is we can not use the new method as client.beta.chat.completions.parse. as you can see from the following exceptions There are some conflicts to be solved between openai's native structured output and our codes for now.

File /usr/local/lib/python3.12/site-packages/openai/resources/beta/chat/completions.py:330, in AsyncCompletions.parse(self, messages, model, response_format, frequency_penalty, function_call, functions, logit_bias, logprobs, max_tokens, n, parallel_tool_calls, presence_penalty, seed, service_tier, stop, stream_options, temperature, tool_choice, tools, top_logprobs, top_p, user, extra_headers, extra_query, extra_body, timeout)
    320 _validate_input_tools(tools)
    322 extra_headers = {
    323     "X-Stainless-Helper-Method": "beta.chat.completions.parse",
    324     **(extra_headers or {}),
    325 }
    327 raw_completion = await self._client.chat.completions.create(
    328     messages=messages,
    329     model=model,
--> 330     response_format=_type_to_response_format(response_format),
    331     frequency_penalty=frequency_penalty,
    332     function_call=function_call,
    333     functions=functions,
    334     logit_bias=logit_bias,
    335     logprobs=logprobs,
    336     max_tokens=max_tokens,
    337     n=n,
    338     parallel_tool_calls=parallel_tool_calls,
    339     presence_penalty=presence_penalty,
    340     seed=seed,
    341     service_tier=service_tier,
    342     stop=stop,
    343     stream_options=stream_options,
    344     temperature=temperature,
    345     tool_choice=tool_choice,
    346     tools=tools,
    347     top_logprobs=top_logprobs,
    348     top_p=top_p,
    349     user=user,
    350     extra_headers=extra_headers,
    351     extra_query=extra_query,
    352     extra_body=extra_body,
    353     timeout=timeout,
    354 )
    355 return _parse_chat_completion(
    356     response_format=response_format,
    357     chat_completion=raw_completion,
    358     input_tools=tools,
    359 )

File /usr/local/lib/python3.12/site-packages/openai/lib/_parsing/_completions.py:244, in type_to_response_format_param(response_format)
    239 # type checkers don't narrow the negation of a `TypeGuard` as it isn't
    240 # a safe default behaviour but we know that at this point the `response_format`
    241 # can only be a `type`
    242 response_format = cast(type, response_format)
--> 244 if not is_basemodel_type(response_format):
    245     raise TypeError(f"Unsupported response_format type - {response_format}")
    247 return {
    248     "type": "json_schema",
    249     "json_schema": {
   (...)
    253     },
    254 }

File /usr/local/lib/python3.12/site-packages/openai/lib/_parsing/_completions.py:220, in is_basemodel_type(typ)
    219 def is_basemodel_type(typ: type) -> TypeGuard[type[pydantic.BaseModel]]:
--> 220     return issubclass(typ, pydantic.BaseModel)

File <frozen abc>:123, in __subclasscheck__(cls, subclass)

TypeError: issubclass() arg 1 must be a class

Aug 16 '24 02:08 JerryKwan

Seems like the problem exists in the following function and the default None response_format in our codes

https://github.com/openai/openai-python/blob/b143c164678ad7579161448846ce67be62e7f21f/src/openai/lib/_parsing/_completions.py#L230

Aug 16 '24 05:08 JerryKwan

@rlouf We can use the following logic so bypass the conflicts and get the expected result

        import openai
        if not hasattr(openai, "pydantic_function_tool"): 
            raise NotImplementedError(
                "The OpenAI library does not support Native Structured Outputs, please upgrade to the latest version."
            )
        # construct tools to be used in the API call to bypass the conflicts between cloudpickle and IPython
        tools = [
            {
                "type": "function",
                "function": {
                    "name": model.__name__,
                    "strict": True,
                    "parameters": openai.lib._pydantic.to_strict_json_schema(model),
                    },
            }
        ]
        response_format = openai.NOT_GIVEN
        config = replace(
            self.config, max_tokens=max_tokens, tools=tools, response_format=response_format
        )
        response, prompt_tokens, completion_tokens = generate_structured_output(
            prompt, system_prompt, self.client, config
        )
        # convert the response to pydantic object, and should be removed when the bug of pydantic working with cloudpickle and iPython is solved
        response = openai._compat.model_parse_json(model, response)

and the output looks like

In [1]: from pydantic import BaseModel
   ...: import outlines
   ...: from outlines import models, generate
   ...: 
   ...: class User(BaseModel):
   ...:     name: str
   ...:     last_name: str
   ...:     id: int
   ...: 

In [2]: model = models.openai('gpt-4o-mini',
   ...:                       api_key="sk-proj-xxxxxxxxxxxxxxxxxxxxx",
   ...:                       )

In [3]: generator = generate.json(model, User)
   ...: result = generator(
   ...:     "Create a user profile with the fields name, last_name and id"
   ...: )

In [4]: result
Out[4]: User(name='John', last_name='Doe', id=1)

But I don't think this is an elegant solution. and we need to refactor it later (once the bugs in the dependencies are solved) @rlouf Would you please help to give some comments? Do we need to use this solution for a temporary solution and refactor it later? thank you

Aug 16 '24 06:08 JerryKwan