feat(chat): Implement multi-modal image input
This PR introduces the ability for users to attach and send images along with text messages in the chat interface. Note: This initial implementation adds multi-modal support specifically for the OpenAI provider.
Key Changes:
Frontend (Input.tsx, ChatView.tsx, UserMessage.tsx):
- Allows pasting images into the input field.
- Displays image previews below the input text area.
- Allows removing attached images before sending.
- Sends both text and image data (as base64 URIs) in the
createUserMessagepayload. - Renders user-uploaded images within the
UserMessagecomponent. - Updates
handleSubmitandcreateUserMessageto handle the new multi-modal structure.
Backend (openai.rs, utils.rs):
- Refactors the
format_messagesfunction for the OpenAI provider to correctly handle the new array-based content format (mixing text and image parts in a single message). - Ensures tool responses containing images are handled gracefully (placeholder text + separate image message).
- Updates
convert_imageutility to correctly format data URIs for OpenAI.
Here is a screenshot of how it looks:
@lars-hagen awesome! appreciate this work. I am attaching designs that our designer Spencer had around this. note - there are concepts in the screens thats not implemented yet. but key things to look at are - the drag and drop support (this could be split into a separate pr) and the image preview style.
other than that, if this is openai specific, we likely have to handle the cases for when its not 🤔
A quick win @lars-hagen would be to render the images above the text entry areas. This is so sick!!
Adressing the feedback from @nahiyankhan and @spencrmartin and various other improvements.
Drop Zone UI:
- Added an
Uploadicon to visually indicate the drop target area. - Updated the drop zone styling to use a neutral grey dashed border and removed the background color.
- Refined the indicator text to be more descriptive: "Drop files here to upload into your goose chat".
Image Preview Styling:
- Reduced the size of attached image previews in the input area.
- Repositioned the "remove" (X) button to the top-right corner of the preview and adjusted its offset for better spacing.
These changes aim to improve the drag-and-drop user experience and align the image preview styling closer to the design feedback.
Here is a video of how it looks.
https://github.com/user-attachments/assets/f4544de4-35fc-4ed2-a097-77aa08926b96
woah this is cool! so for non openai providers - what will happen, won't behave the same but won't break?
Some reviewed comments left for you, @lars-hagen ! Let us know if you have any additional followup questions. Thank you so much for your contribution <3
is there a way to provide images using the cli? maybe even through recipes or plans
Looks great! Tested locally and goose ran into a server internal error when I attempted to chat with the screenshot. @lars-hagen Maybe just need to pull in main? @lily-de is this happening for you also?
looks like this has been done in a different way in the meantime - is sthis still relevant?
No I'll close it down thanks for the reminder and good job team goose on your implementation of the multi modal looks nice :)