langchainrb icon indicating copy to clipboard operation
langchainrb copied to clipboard

Add ability to send images to the Assistant

Open andreibondarev opened this issue 2 years ago • 2 comments

You should be able to provide an image_url to the Assistant for the supported multi-modal LLMs:

  • [x] OpenAI support (https://github.com/patterns-ai-core/langchainrb/pull/799)
  • [x] Mistral AI support (https://github.com/patterns-ai-core/langchainrb/pull/803)
  • [ ] Ollama
  • [ ] Anthropic
  • [ ] Google Gemini
  • [ ] Google Vertex AI

Note

Some of the LLMs do not accept an image_url rather a Base64-encoded payload (Anthropic) or a file URI uploaded to the cloud (Google Gemini). We need to figure out how to handle it.

andreibondarev avatar Dec 09 '23 00:12 andreibondarev

Hi @andreibondarev, I noticed that the current version already supports sending images to LLMs.

You just need to include the image within the messages parameter. For example, when using OpenAI models, you can include images using the image_url content type. Here's how:

llm = Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"])

llm.chat(
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        {
          type: "image_url",
          image_url: {
            url: "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
          }
        }
      ]
    }
  ],
  model: "gpt-4o"
).completion

Other LLMs only support sending the image in base64 format, but this must still be done within the messages parameter.

dghirardo avatar Jun 04 '24 18:06 dghirardo

Support for OpenAI with https://github.com/patterns-ai-core/langchainrb/pull/799.

andreibondarev avatar Sep 30 '24 19:09 andreibondarev