Following the `MultiModalMessage` example ends up in json decoder error.
What happened?
Describe the bug
There is a section in the Autogen Agent Chat User Guide on Agents that demonstrates how to send a MultiModalMessage (with an image) to an LLM-based agent. When following the steps and running the example code as-is, it throws the following error:
TypeError: Object of type Image is not JSON serializable
To Reproduce
- Go to the tutorial page mentioned above.
- Follow the code example where MultiModalMessage is created with an image using PIL.Image.
- Execute the code.
Expected behavior
The code should work as per the documentation instead throwing any serialize related error.
Screenshots Not accplicable.
Additional context Below is the code i have tried to run.
import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import MultiModalMessage
from autogen_ext.models.openai import OpenAIChatCompletionClient
from dotenv import load_dotenv
from autogen_core import Image
import PIL
from io import BytesIO
import requests
load_dotenv(".env")
model_client = OpenAIChatCompletionClient(
model = "gemini-2.0-flash"
)
ocr_agent = AssistantAgent(
name='ocr_agent',
model_client=model_client,
system_message="""You are a helpful agent."""
)
async def main():
pil_image = PIL.Image.open(BytesIO(requests.get("https://picsum.photos/300/200").content))
img = Image(pil_image)
multi_modal_message = MultiModalMessage(content=["Can you describe the content of this image?", img], source="user")
result = await ocr_agent.run(task=multi_modal_message)
print(result.messages[-1].content)
if __name__ == "__main__":
asyncio.run(main())
Which packages was the bug in?
Python AgentChat (autogen-agentchat>=0.4.0)
AutoGen library version.
Python dev (main branch)
Other library version.
No response
Model used
gemini-2.0-flash
Model provider
Google Gemini
Other model provider
No response
Python version
3.12
.NET version
None
Operating system
Ubuntu
Okay.. I'll resolve it. I think I know what's the problem.
Same issue with Gemini models.
@sasan-hashemi I see... haha. I know what is this issue and resolved it.
We've explored its potential for multi-agent file processing with Gemini. However, we've decided to move to another framework that aligns more closely with our current needs for this specific use case.