Wenxuan Huang
Wenxuan Huang
Thanks! By the way, I'd like to ask whether this example integrates the image tokens into the rollout trajectory, the base model can use these image tokens to generate reasoning...
Excuse me, but I'd like to know the timelines for supporting multimodal like Qwen3-VL, as I am looking for a framework to finish one work involving multimodal ReAct. Do you...
@LiWentomng Thanks for your response! I hope to know why the inference times in Figure 4 between LLaVA-TokenPacker and official LLaVA-1.5 have significant gap? In our experiments, the image token...
The same problem. So how to solve it?