genkit icon indicating copy to clipboard operation
genkit copied to clipboard

Custom Format Parsers

Open mbleigh opened this issue 1 year ago • 0 comments

Currently the output() method on Genkit results attempts to leniently parse JSON based on whether format is supplied or an output schema is present. There are a variety of reasons, however, that someone might want to have greater control over output parsing than this.

Proposed is to extend the format option to instead be a registry of formatters with some defaults (text and json at a minimum), provided. To define a custom format parser, developers would specify it something like so:

defineFormat({
  name: 'csv',
  parseResponse: (res) => {
    const toParse = extractCodeFence(res.text());
    return parseCSV(toParse);
  },
  // optional, if omitted streaming `.output()` is not supported for this format type
  parseChunk: (chunk, partialResponse) => {
    // here `chunk` contains only the most recent data, `partialResponse` contains
    // a Message with all chunks received so far
    const toParse = extractPartialCodeFence(partialResponse.text());
    return parseCSV(toParse);
  },
  instructions: (req) => {
    return `Output should be in CSV format with the following columns:\n\n${schemaToCSVSpec(req.output.schema)}.`
  }
});

The parser definition semantics should be flexible enough to handle many different scenarios, including:

  1. Built-up response: where each chunk in a stream returns an incrementally more complete response
  2. Buffered chunking: where e.g. JSONL is streamed but only on complete objects (so parseChunk must have the ability to return null and not emit a chunk to the end user)
  3. Options/Schema: allow a format parser to access the full request including output schema and control how output validation occurs.

To use a custom format is simple: just use the format option already in output.

generate({
  prompt: "Generate a contact list with 10 people.",
  output: {
    format: 'csv',
    schema: z.array(z.object({firstName: z.string(), lastName: z.string()})
  },
});

mbleigh avatar Jul 29 '24 17:07 mbleigh