Yue Cao comments

Results 39 comments of


                                            Yue Cao

How to use model to inference with image and video?

> > > I have just downloaded model htc++_beit_adapter_large_fpn_3x_coco.pth and config from this github. But I cannot load model use this command: > > > from mmdet.apis import init_detector configFile...

TypeError

Thank you for your feedback. You can carefully check whether the compiled script make.sh executed successfully and did not report any errors. Or would it be convenient for you to...

the training parameters of your single branch convnext encoder

Thank you for your answer. How should the single-branch convnext in your paper be trained? Because I tried to use your code to train single-branch convnext, the loss effect in...

How to get hidden_states

Hi, Please try set `"output_hidden_states": true,` in [config.json](https://huggingface.co/OpenGVLab/InternVL2_5-78B/blob/main/config.json#L65).

关于internvl 2.5 8B的longvideobench结果

你好，对于longvideobench，我们报告了其16, 32, 48和64帧的最优结果。对于8B你可以[尝试设置frame为48](https://github.com/open-compass/VLMEvalKit/blob/6c5f81cbb50ff780f33799f708c1a76bb26efa7f/vlmeval/dataset/video_dataset_config.py#L39)以测试其结果。参考[技术报告](https://arxiv.org/pdf/2412.05271)中longvideobench介绍段： > We test four settings—16, 32, 48, and 64 frames—and report the best results on the validation set.

[Bug] VL2.5-8B有bug

你好，请问是否有尝试过transformers 3.37.2版本？

微调internvl-chat loral，loss从2.6594下降到0.0001，但是模型输出结果为roproproproproproproproproproproproproproproproprop ...

你好，请问是lora训练吗？lora训练后需要[合并权重](https://internvl.readthedocs.io/en/latest/tutorials/coco_caption_finetune.html#merging-lora-weights)

Data sampling of InternVL2.5-MPO

Hello, As stated in the original text, we force the model to output the final answer in the form of 'Final Answer: xxx' at the end. So for the final...

断点继续训练数据加载问题-IterableDataset的问题

你好，当前代码支持每一步保存模型权重，可以参考参数save_steps。并且支持resume_ckpt，可以参考训练代码[1097行](https://github.com/OpenGVLab/InternVL/blob/34a81000402bf8f716bab8c9b57aff1f6b436bd0/internvl_chat/internvl/train/internvl_chat_pretrain.py#L1097)。这是Trainer实现的功能，会从对应step继续向后训练，代码seed前后一致可以避免数据重复训练。

你好作者，我想问一下MMFuser这部分的代码在哪里可以看到

你好，感谢你的提问，请关注 [llava/model/multimodal_projector](https://github.com/yuecao0119/MMFuser/tree/main/llava/model/multimodal_projector) 这个文件夹，主要模型修改内容主要在这个里面。