guardrails
guardrails copied to clipboard
[feat] Add support for impure raw structured outputs when streaming
Description
Streaming the raw and validated outputs is super helpful and intuitive and users don't need to wait for the entire response to be generated and then kick in Guardarils' validation. Guardrails supports streaming the raw and validated outputs for OpenAI callables and any custom LLM wrapper callables. It expects the callables to return a generator (of chunks) when stream
is set to True
in a guard
call.
For validating structured (JSON) outputs, it works on the crucial assumption that the raw model output only contains pure JSON that guardrails can validate. If the raw output contains any accompanying text, parsing and validation will fail. (This is a documented limitation in the original streaming PR).
e.g. Raw output with pure JSON:
{
"name": "John Doe",
"age": 39
}
In the above case, the streaming validation will work.
e.g. Raw output with accompanying text:
Here is a valid JSON object you requested:
{
"name": "John Doe",
"age": 39
}
In this case, it will fail.
Why is this needed
- This will make streaming more robust and not dependent on the raw LLM to output just JSON.
- Will support multiple use cases and more providers. Currently, in our tests only OpenAI's
gpt-3.5-turbo
,gpt-4-turbo
andtext-davinci-003
(now shut off) produced pure JSON output when prompted with detailed instructions. Other OSS models with the same prompt and instructions did not.
Rough implementation details
Currently, the entire workflow for streaming goes through StreamRunner
. Just add 2 simple checks before parsing and validation kicks in:
- Check 1: Check each chunk and simply wait for the opening tags (optionally alongwith the codetype) OR
{
. Ignore all previous chunks before this. - Check 2: Checking for
}
or closing tags, and ignore any future chunks.
Adding convo from Slack:
@thekaranacharya:
@Caleb Courier in regards to this Discord message: https://discord.com/channels/1085077079697150023/1197022557870772284/1197188300948111391 This works due to this function right? Or is there somewhere else we look for ``json - and potentially there we could also look for the {?
@CalebCourier
correct, that's where it all happens
@thekaranacharya
Okay that func calls this function; which finds code within the opening and closing tags right - and for that we need to have the entire output? I'm wondering we would need to change that a bit for a) Also include { b) Support streaming. Currently in streaming we don't utilise this and extract_json_from_ouput at all.
@CalebCourier
Yeah I think there's at least one new function for extracting content using { as the border, and a new arg to determine if a partial block is useful based on whether or not we're streaming
@thekaranacharya
For b, we could create a new func that: Checks each chunk and simply waits for the opening tags (optionally alongwith the codetype) OR { and then starts processing the outputs And also keeps checking for } or closing tags and then exits
a new arg to determine if a partial block is useful based on whether or not we're streaming
We probably won't require this as when streaming, the entire workflow goes through StreamRunner and we can just add a new func there before parsing and validation. Currently when streaming we just don't check for json extraction, just runs on assumption that only pure JSON will be generated by the model
@CalebCourier
Oh so it just waits for everything before parsing? If that's the case then yeah we won't need it, but it we want to parse on each cumulative chunk we would
@thekaranacharya
No no we don't wait at all. It just skips extracting json like traditionally in Runner. We simply assume that the output is pure JSON. Ik it's not ideal - just opening an issue on Github regarding this discussion
https://github.com/guardrails-ai/guardrails/issues/542
Related to: https://github.com/guardrails-ai/guardrails/issues/543
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 14 days.
This issue was closed because it has been stalled for 14 days with no activity.