goose icon indicating copy to clipboard operation
goose copied to clipboard

feat(chat): Implement multi-modal image input

Open lars-hagen opened this issue 8 months ago • 7 comments

This PR introduces the ability for users to attach and send images along with text messages in the chat interface. Note: This initial implementation adds multi-modal support specifically for the OpenAI provider.

Key Changes:

Frontend (Input.tsx, ChatView.tsx, UserMessage.tsx):

  • Allows pasting images into the input field.
  • Displays image previews below the input text area.
  • Allows removing attached images before sending.
  • Sends both text and image data (as base64 URIs) in the createUserMessage payload.
  • Renders user-uploaded images within the UserMessage component.
  • Updates handleSubmit and createUserMessage to handle the new multi-modal structure.

Backend (openai.rs, utils.rs):

  • Refactors the format_messages function for the OpenAI provider to correctly handle the new array-based content format (mixing text and image parts in a single message).
  • Ensures tool responses containing images are handled gracefully (placeholder text + separate image message).
  • Updates convert_image utility to correctly format data URIs for OpenAI.

Here is a screenshot of how it looks:

1-96f09dc5

lars-hagen avatar Apr 08 '25 14:04 lars-hagen

@lars-hagen awesome! appreciate this work. I am attaching designs that our designer Spencer had around this. note - there are concepts in the screens thats not implemented yet. but key things to look at are - the drag and drop support (this could be split into a separate pr) and the image preview style.

Screenshot 2025-04-10 at 8 28 17 PM

other than that, if this is openai specific, we likely have to handle the cases for when its not 🤔

nahiyankhan avatar Apr 11 '25 00:04 nahiyankhan

A quick win @lars-hagen would be to render the images above the text entry areas. This is so sick!!

spencrmartin avatar Apr 11 '25 14:04 spencrmartin

Adressing the feedback from @nahiyankhan and @spencrmartin and various other improvements.

Drop Zone UI:

  • Added an Upload icon to visually indicate the drop target area.
  • Updated the drop zone styling to use a neutral grey dashed border and removed the background color.
  • Refined the indicator text to be more descriptive: "Drop files here to upload into your goose chat".

Image Preview Styling:

  • Reduced the size of attached image previews in the input area.
  • Repositioned the "remove" (X) button to the top-right corner of the preview and adjusted its offset for better spacing.

These changes aim to improve the drag-and-drop user experience and align the image preview styling closer to the design feedback.

Here is a video of how it looks.

https://github.com/user-attachments/assets/f4544de4-35fc-4ed2-a097-77aa08926b96

lars-hagen avatar Apr 15 '25 13:04 lars-hagen

woah this is cool! so for non openai providers - what will happen, won't behave the same but won't break?

michaelneale avatar Apr 15 '25 21:04 michaelneale

Some reviewed comments left for you, @lars-hagen ! Let us know if you have any additional followup questions. Thank you so much for your contribution <3

taniandjerry avatar Apr 22 '25 15:04 taniandjerry

is there a way to provide images using the cli? maybe even through recipes or plans

BjoernRave avatar May 11 '25 17:05 BjoernRave

Looks great! Tested locally and goose ran into a server internal error when I attempted to chat with the screenshot. @lars-hagen Maybe just need to pull in main? @lily-de is this happening for you also?

image

zanesq avatar May 12 '25 19:05 zanesq

looks like this has been done in a different way in the meantime - is sthis still relevant?

michaelneale avatar Jun 17 '25 02:06 michaelneale

No I'll close it down thanks for the reminder and good job team goose on your implementation of the multi modal looks nice :)

lars-hagen avatar Jun 17 '25 22:06 lars-hagen