LibreChat icon indicating copy to clipboard operation
LibreChat copied to clipboard

[Enhancement]: Integration of Gemini 2.0 (text and images output)

Open goreply-de opened this issue 9 months ago • 3 comments

What features would you like to see added?

Adding support to generate a combination of text and image as output would be great.

More details

With gemini-2.0-flash-exp Gemini can now support the generation of text and images as output.

To access this feature on GCP you need to navigate to ‘Vertex AI’ → ‘Vertex AI Studio’ → ‘Freeform’ → Select ‘gemini-2.0-flash-exp’ as model and ‘Image and text’ as response output type.

For this an option in the UI needs to be added to support this kind of output if the chosen model supports it. Probably endpoints need be extended as well to handle this.

Note: Currently the feature is still experimental and needs to be activated.

Which components are impacted by your request?

Endpoints, UI

Pictures

Image

Image

Code of Conduct

  • [x] I agree to follow this project's Code of Conduct

goreply-de avatar Feb 26 '25 14:02 goreply-de

+1 this would be really powerful for us as this eliminates the need for a image generation model

marlonka avatar Feb 27 '25 18:02 marlonka

Gemini response Image in base64 format but not url format. I think librechat should allow the message markdown parser base64 image. And response_modalities = ["image", "text",] shoule be added when using "gemini-2.0-flash-exp" model. And the response format for generated image is: { "inlineData": { "mimeType": "image/png", "data": "iV......(base64png)" } but this inlineData is also ignored by the completion parser.

nidasfly avatar Mar 18 '25 19:03 nidasfly

Is it under development?

Sapkotaanish avatar Mar 26 '25 04:03 Sapkotaanish

Thanks for working on images with Google models. As another datapoint, I am trying to return images generated on an MCP server through Gemini models and they also don't display. The python MCP server returns images as a base64 encoded png using the ImageContent type (https://github.com/modelcontextprotocol/python-sdk/blob/main/src/mcp/types.py#L634). With OpenAI models, the result is display in chat, but with any of the Gemini models it will not display it and depending on the model may attempt to display it just spitting out text.

I've tried to poke around with adjustments in MCP.js thinking this might be the switch without any luck (https://github.com/danny-avila/LibreChat/blob/6dd1b3988651ea4b56b2b84d9ae8e042fbdd0bc1/api/server/services/MCP.js#L69). I'm getting from this and other linked issues that it's a bit harder than a few tweaks. Happy to help with testing or implementation with some pointers.

Image

chapmanb avatar May 12 '25 14:05 chapmanb

@chapmanb Please read the notes on MCP image outputs here:

https://www.librechat.ai/docs/features/image_gen#5--model-context-protocol-mcp

TL;DR, you need to format your tool output exactly as specified by MCP for images. Few MCP servers do this correctly, look at puppeteer MCP source code to see correct format.

danny-avila avatar May 12 '25 14:05 danny-avila

Thanks so much for the response, and for all your work on LibreChat. I wrote the MCP server, and it returns the image in same format as the puppeteer example (https://github.com/modelcontextprotocol/servers/blob/main/src/puppeteer/index.ts#L267 for anyone else following). Mine is written in python, but I'm using the equivalent ImageContent type from the python SDK. For OpenAI calling, the tool does render correctly, and the result seems to be recognized and processed differently. For Google tools, it appears in the result so is maybe being interpreted as text, where for OpenAI it does not appear in the tool call output. This all led me to think there is something in the Google message handling of MCP tool calls, but I can't figure out what's different.

Image

chapmanb avatar May 12 '25 14:05 chapmanb

is support for gemini native image gen output going to happen somehow soon? :) @danny-avila

marlonka avatar Aug 26 '25 17:08 marlonka

is support for gemini native image gen output going to happen somehow soon? :) @danny-avila

I stumbled upon this thread, cause I'm trying to get gemini-2.5-flash-image-preview to return images :) It'll give a reply like "Here's an image with a duck for you" but that's it, no images.

mikkelnl avatar Aug 31 '25 10:08 mikkelnl

+1

sunsky89757 avatar Sep 09 '25 17:09 sunsky89757