autogen icon indicating copy to clipboard operation
autogen copied to clipboard

Following the `MultiModalMessage` example ends up in json decoder error.

Open amitv9493 opened this issue 7 months ago • 3 comments

What happened?

Describe the bug

There is a section in the Autogen Agent Chat User Guide on Agents that demonstrates how to send a MultiModalMessage (with an image) to an LLM-based agent. When following the steps and running the example code as-is, it throws the following error:

TypeError: Object of type Image is not JSON serializable

To Reproduce

  1. Go to the tutorial page mentioned above.
  2. Follow the code example where MultiModalMessage is created with an image using PIL.Image.
  3. Execute the code.

Expected behavior The code should work as per the documentation instead throwing any serialize related error.

Screenshots Not accplicable.

Additional context Below is the code i have tried to run.


import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import MultiModalMessage
from autogen_ext.models.openai import OpenAIChatCompletionClient
from dotenv import load_dotenv
from autogen_core import Image
import PIL
from io import BytesIO
import  requests
load_dotenv(".env")

model_client = OpenAIChatCompletionClient(
    model = "gemini-2.0-flash"
)

ocr_agent = AssistantAgent(
    name='ocr_agent',
    model_client=model_client,
    system_message="""You are a helpful agent."""
)
async def main():

    pil_image = PIL.Image.open(BytesIO(requests.get("https://picsum.photos/300/200").content))
    img = Image(pil_image)
    multi_modal_message = MultiModalMessage(content=["Can you describe the content of this image?", img], source="user")
    result = await ocr_agent.run(task=multi_modal_message)
    print(result.messages[-1].content)


if __name__ == "__main__":
    asyncio.run(main())

Which packages was the bug in?

Python AgentChat (autogen-agentchat>=0.4.0)

AutoGen library version.

Python dev (main branch)

Other library version.

No response

Model used

gemini-2.0-flash

Model provider

Google Gemini

Other model provider

No response

Python version

3.12

.NET version

None

Operating system

Ubuntu

amitv9493 avatar Apr 30 '25 13:04 amitv9493

Okay.. I'll resolve it. I think I know what's the problem.

SongChiYoung avatar Apr 30 '25 13:04 SongChiYoung

Same issue with Gemini models.

sasan-hashemi avatar Apr 30 '25 14:04 sasan-hashemi

@sasan-hashemi I see... haha. I know what is this issue and resolved it.

SongChiYoung avatar Apr 30 '25 14:04 SongChiYoung

We've explored its potential for multi-agent file processing with Gemini. However, we've decided to move to another framework that aligns more closely with our current needs for this specific use case.

amitv9493 avatar May 01 '25 02:05 amitv9493