Wenxuan Huang comments

Repositories
Issues
Comments

Results 4 comments of


                                            Wenxuan Huang

Multimodal support?

Thanks! By the way, I'd like to ask whether this example integrates the image tokens into the rollout trajectory, the base model can use these image tokens to generate reasoning...

Multimodal support?

Excuse me, but I'd like to know the timelines for supporting multimodal like Qwen3-VL, as I am looking for a framework to finish one work involving multimodal ReAct. Do you...

About the inference times reported in Figure 4 and Table 3

@LiWentomng Thanks for your response! I hope to know why the inference times in Figure 4 between LLaVA-TokenPacker and official LLaVA-1.5 have significant gap? In our experiments, the image token...

step1 参数更新时间超长

The same problem. So how to solve it?