genkit
genkit copied to clipboard
Custom Format Parsers
Currently the output() method on Genkit results attempts to leniently parse JSON based on whether format is supplied or an output schema is present. There are a variety of reasons, however, that someone might want to have greater control over output parsing than this.
Proposed is to extend the format option to instead be a registry of formatters with some defaults (text and json at a minimum), provided. To define a custom format parser, developers would specify it something like so:
defineFormat({
name: 'csv',
parseResponse: (res) => {
const toParse = extractCodeFence(res.text());
return parseCSV(toParse);
},
// optional, if omitted streaming `.output()` is not supported for this format type
parseChunk: (chunk, partialResponse) => {
// here `chunk` contains only the most recent data, `partialResponse` contains
// a Message with all chunks received so far
const toParse = extractPartialCodeFence(partialResponse.text());
return parseCSV(toParse);
},
instructions: (req) => {
return `Output should be in CSV format with the following columns:\n\n${schemaToCSVSpec(req.output.schema)}.`
}
});
The parser definition semantics should be flexible enough to handle many different scenarios, including:
- Built-up response: where each chunk in a stream returns an incrementally more complete response
- Buffered chunking: where e.g. JSONL is streamed but only on complete objects (so
parseChunkmust have the ability to returnnulland not emit a chunk to the end user) - Options/Schema: allow a format parser to access the full request including output schema and control how output validation occurs.
To use a custom format is simple: just use the format option already in output.
generate({
prompt: "Generate a contact list with 10 people.",
output: {
format: 'csv',
schema: z.array(z.object({firstName: z.string(), lastName: z.string()})
},
});