lorax icon indicating copy to clipboard operation
lorax copied to clipboard

In Structured Output, a JSON schema with a date string format will yield invalid JSON

Open oscarjohansson94 opened this issue 1 year ago • 2 comments

System Info

lorax 0.9.0, running with docker.

Information

  • [X] Docker
  • [ ] The CLI directly

Tasks

  • [X] An officially supported command
  • [ ] My own modifications

Reproduction

I am getting a invalid json when using Structured Output and sending a json schema that contains a string with a date format.

This might be because LoRAX structured output is using Outlines, and Outlines does not support all json formats (https://github.com/outlines-dev/outlines/issues/215). But I would still expect LoRAX to output a valid json object, or clearly document this behavior.

curl --request POST \
  --url <lorax_url>/generate \
  --data '{
  "inputs": "set Today to 20222",
  "parameters": {
    "response_format": {
          "type": "json_object",
          "schema": {
            "properties": {
              "today": {
                "format": "date",
                "title": "Today",
                "type": "string"
              }
            },
            "required": ["today"],
            "title": "Test",
            "type": "object"
          }
    }
  }
}

Will result in an output such as

{
  "generated_text": "{\n\n    \"today\": 2022-02-22\n}"
}

This json output is invalid since the date is not quoted.

If you remove the format from the schema the results is a valid json object:

curl --request POST \
  --url <lorax_url>/generate \
  --data '{
  "inputs": "set Today to 20222",
  "parameters": {
    "response_format": {
          "type": "json_object",
          "schema": {
            "properties": {
              "today": {
                "title": "Today",
                "type": "string"
              }
            },
            "required": ["today"],
            "title": "Test",
            "type": "object"
          }
    }
  }
}
{
  "generated_text": "{\n\n    \"today\": \"2022-02-22\"\n}"
}

However, I guess it would be possible to get the llm to output a string that is not in a valid date format in this case.

Expected behavior

I expect the structured output to be a valid json, following the provided json schema.

oscarjohansson94 avatar Apr 05 '24 14:04 oscarjohansson94

same problem

prd-tuong-nguyen avatar Apr 12 '24 04:04 prd-tuong-nguyen

@jeffreyftang I see that you are assigned to this. Looks like this is solved by #567, and all that is needed is an upgrade to the latest version of outlines.

oscarjohansson94 avatar Apr 23 '24 07:04 oscarjohansson94

Thanks @oscarjohansson94, we updated Outline to v0.0.40 in #447, which should address this issue. Closing for now.

tgaddair avatar May 23 '24 19:05 tgaddair