instructor_ex
instructor_ex copied to clipboard
Multimodal support for Gemini
Currently, it's only possible to send text messages using the Gemini adapter:
https://github.com/thmsmlr/instructor_ex/blob/1abd8473d05111c11a4d9033b6a88acc29737fa0/lib/instructor/adapters/gemini.ex#L61
The Gemini API supports image, video and audio inputs(unlike the OpenAI API where you send the file contents base64-encoded, you need to upload the file separately)
Would you be open to a PR that adds support for uploading files, or would you say that is out of scope of this project?
If it's out of scope, I can create a smaller PR that allows media URLs(with the upload happening outside the library):
Instructor.chat_completion(
mode: :json_schema,
model: "gemini-1.5-flash",
response_model: VideoDesc,
messages: [
%{
role: "user",
content: [
%{
type: "video_url",
video_url: %{
url: "https://generativelanguage.googleapis.com/v1beta/files/..."
}
},
%{
type: "text",
text: " what's going on in this video?"
}
]
}
]
)