goose feat(chat): Implement multi-modal image input

This PR introduces the ability for users to attach and send images along with text messages in the chat interface. Note: This initial implementation adds multi-modal support specifically for the OpenAI provider.

Key Changes:

Frontend (Input.tsx, ChatView.tsx, UserMessage.tsx):

Allows pasting images into the input field.
Displays image previews below the input text area.
Allows removing attached images before sending.
Sends both text and image data (as base64 URIs) in the createUserMessage payload.
Renders user-uploaded images within the UserMessage component.
Updates handleSubmit and createUserMessage to handle the new multi-modal structure.

Backend (openai.rs, utils.rs):

Refactors the format_messages function for the OpenAI provider to correctly handle the new array-based content format (mixing text and image parts in a single message).
Ensures tool responses containing images are handled gracefully (placeholder text + separate image message).
Updates convert_image utility to correctly format data URIs for OpenAI.

Here is a screenshot of how it looks:

1-96f09dc5

Apr 08 '25 14:04 lars-hagen

@lars-hagen awesome! appreciate this work. I am attaching designs that our designer Spencer had around this. note - there are concepts in the screens thats not implemented yet. but key things to look at are - the drag and drop support (this could be split into a separate pr) and the image preview style.

other than that, if this is openai specific, we likely have to handle the cases for when its not 🤔

Apr 11 '25 00:04 nahiyankhan

A quick win @lars-hagen would be to render the images above the text entry areas. This is so sick!!

Apr 11 '25 14:04 spencrmartin

Adressing the feedback from @nahiyankhan and @spencrmartin and various other improvements.

Drop Zone UI:

Added an Upload icon to visually indicate the drop target area.
Updated the drop zone styling to use a neutral grey dashed border and removed the background color.
Refined the indicator text to be more descriptive: "Drop files here to upload into your goose chat".

Image Preview Styling:

Reduced the size of attached image previews in the input area.
Repositioned the "remove" (X) button to the top-right corner of the preview and adjusted its offset for better spacing.

These changes aim to improve the drag-and-drop user experience and align the image preview styling closer to the design feedback.

Here is a video of how it looks.

https://github.com/user-attachments/assets/f4544de4-35fc-4ed2-a097-77aa08926b96

Apr 15 '25 13:04 lars-hagen

woah this is cool! so for non openai providers - what will happen, won't behave the same but won't break?

Apr 15 '25 21:04 michaelneale

Some reviewed comments left for you, @lars-hagen ! Let us know if you have any additional followup questions. Thank you so much for your contribution <3

Apr 22 '25 15:04 taniandjerry

is there a way to provide images using the cli? maybe even through recipes or plans

May 11 '25 17:05 BjoernRave

Looks great! Tested locally and goose ran into a server internal error when I attempted to chat with the screenshot. @lars-hagen Maybe just need to pull in main? @lily-de is this happening for you also?

May 12 '25 19:05 zanesq

looks like this has been done in a different way in the meantime - is sthis still relevant?

Jun 17 '25 02:06 michaelneale

No I'll close it down thanks for the reminder and good job team goose on your implementation of the multi modal looks nice :)

Jun 17 '25 22:06 lars-hagen