genkit
genkit copied to clipboard
[Proposal] Add "modalities" to output config
There are now at least two models (Gemini 2.0 Flash Exp and GPT-4o) that have native multi-modal output from a general-purpose model (as opposed to being an image-only model).
I propose a new model configuration option under output called modalities:
{
output: {
modalities?: ('text' | 'image' | 'video' | 'audio')[]
}
}
Modalities should also be part of the supports metadata for a model. When supplied, modalities guide what kind of output the user is looking for. This is separate from (but a little muddy with) format which guides how the output is going to be parsed.