agent-lightning icon indicating copy to clipboard operation
agent-lightning copied to clipboard

Multimodal support?

Open Osilly opened this issue 3 months ago • 7 comments

Great work! I would like to know whether this framework supports multimodal input from agents. For example, could it handle image and text responses from agents (perhaps similar to OpenAI o3)?

Osilly avatar Sep 22 '25 11:09 Osilly

We have people working on an example. There is currently no blocking issue.

matluster avatar Sep 23 '25 00:09 matluster

Thanks! By the way, I'd like to ask whether this example integrates the image tokens into the rollout trajectory, the base model can use these image tokens to generate reasoning for the next step, i.e., think with images.

Osilly avatar Sep 23 '25 07:09 Osilly

Really looking forward to the support for multimodal.

Qiao0124 avatar Sep 23 '25 12:09 Qiao0124

Excuse me, but I'd like to know the timelines for supporting multimodal like Qwen3-VL, as I am looking for a framework to finish one work involving multimodal ReAct. Do you plan to support them in the near future?

Osilly avatar Oct 19 '25 12:10 Osilly

Really looking forward to the support for multimodal.

Marquis03 avatar Oct 22 '25 08:10 Marquis03

mark

xs1997zju avatar Oct 28 '25 07:10 xs1997zju

expect

WGowi avatar Nov 06 '25 01:11 WGowi