agent-lightning Multimodal support?

Great work! I would like to know whether this framework supports multimodal input from agents. For example, could it handle image and text responses from agents (perhaps similar to OpenAI o3)?

Sep 22 '25 11:09 Osilly

We have people working on an example. There is currently no blocking issue.

Sep 23 '25 00:09 matluster

Thanks! By the way, I'd like to ask whether this example integrates the image tokens into the rollout trajectory, the base model can use these image tokens to generate reasoning for the next step, i.e., think with images.

Sep 23 '25 07:09 Osilly

Really looking forward to the support for multimodal.

Sep 23 '25 12:09 Qiao0124

Excuse me, but I'd like to know the timelines for supporting multimodal like Qwen3-VL, as I am looking for a framework to finish one work involving multimodal ReAct. Do you plan to support them in the near future?

Oct 19 '25 12:10 Osilly

Really looking forward to the support for multimodal.

Oct 22 '25 08:10 Marquis03

mark

Oct 28 '25 07:10 xs1997zju

expect

Nov 06 '25 01:11 WGowi