LibreChat
LibreChat copied to clipboard
[Enhancement]: Integration of Gemini 2.0 (text and images output)
What features would you like to see added?
Adding support to generate a combination of text and image as output would be great.
More details
With gemini-2.0-flash-exp Gemini can now support the generation of text and images as output.
To access this feature on GCP you need to navigate to ‘Vertex AI’ → ‘Vertex AI Studio’ → ‘Freeform’ → Select ‘gemini-2.0-flash-exp’ as model and ‘Image and text’ as response output type.
For this an option in the UI needs to be added to support this kind of output if the chosen model supports it. Probably endpoints need be extended as well to handle this.
Note: Currently the feature is still experimental and needs to be activated.
Which components are impacted by your request?
Endpoints, UI
Pictures
Code of Conduct
- [x] I agree to follow this project's Code of Conduct
+1 this would be really powerful for us as this eliminates the need for a image generation model
Gemini response Image in base64 format but not url format. I think librechat should allow the message markdown parser base64 image. And response_modalities = ["image", "text",] shoule be added when using "gemini-2.0-flash-exp" model. And the response format for generated image is: { "inlineData": { "mimeType": "image/png", "data": "iV......(base64png)" } but this inlineData is also ignored by the completion parser.
Is it under development?
Thanks for working on images with Google models. As another datapoint, I am trying to return images generated on an MCP server through Gemini models and they also don't display. The python MCP server returns images as a base64 encoded png using the ImageContent type (https://github.com/modelcontextprotocol/python-sdk/blob/main/src/mcp/types.py#L634). With OpenAI models, the result is display in chat, but with any of the Gemini models it will not display it and depending on the model may attempt to display it just spitting out text.
I've tried to poke around with adjustments in MCP.js thinking this might be the switch without any luck (https://github.com/danny-avila/LibreChat/blob/6dd1b3988651ea4b56b2b84d9ae8e042fbdd0bc1/api/server/services/MCP.js#L69). I'm getting from this and other linked issues that it's a bit harder than a few tweaks. Happy to help with testing or implementation with some pointers.
@chapmanb Please read the notes on MCP image outputs here:
https://www.librechat.ai/docs/features/image_gen#5--model-context-protocol-mcp
TL;DR, you need to format your tool output exactly as specified by MCP for images. Few MCP servers do this correctly, look at puppeteer MCP source code to see correct format.
Thanks so much for the response, and for all your work on LibreChat. I wrote the MCP server, and it returns the image in same format as the puppeteer example (https://github.com/modelcontextprotocol/servers/blob/main/src/puppeteer/index.ts#L267 for anyone else following). Mine is written in python, but I'm using the equivalent ImageContent type from the python SDK. For OpenAI calling, the tool does render correctly, and the result seems to be recognized and processed differently. For Google tools, it appears in the result so is maybe being interpreted as text, where for OpenAI it does not appear in the tool call output. This all led me to think there is something in the Google message handling of MCP tool calls, but I can't figure out what's different.
is support for gemini native image gen output going to happen somehow soon? :) @danny-avila
is support for gemini native image gen output going to happen somehow soon? :) @danny-avila
I stumbled upon this thread, cause I'm trying to get gemini-2.5-flash-image-preview to return images :) It'll give a reply like "Here's an image with a duck for you" but that's it, no images.
+1