[Proposal] Add "modalities" to output config

Open mbleigh opened this issue 8 months ago • 0 comments

There are now at least two models (Gemini 2.0 Flash Exp and GPT-4o) that have native multi-modal output from a general-purpose model (as opposed to being an image-only model).

I propose a new model configuration option under output called modalities:

{
  output: {
    modalities?: ('text' | 'image' | 'video' | 'audio')[]
  }
}

Modalities should also be part of the supports metadata for a model. When supplied, modalities guide what kind of output the user is looking for. This is separate from (but a little muddy with) format which guides how the output is going to be parsed.

Mar 27 '25 17:03 mbleigh