InternVL Can you open source the code of VisualPRM-8B?

Thanks!

Apr 08 '25 10:04 manglu097

Thank you for your interest in our work. We have released the model and training data of VisualPRM. The model code is also released in HuggingFace. The training code of VisualPRM is the same with the fine-tuning code of InternVL. You can refer to our fine-tuning document and set the data path as our VisualPRM400K.

Apr 18 '25 06:04 Weiyun1025

BTW, we evaluate our model with VLMEvalkit, you can refer to the evaluation code for how to use VisualPRM to select the best response.

Apr 18 '25 06:04 Weiyun1025

BTW, we evaluate our model with VLMEvalkit, you can refer to the evaluation code for how to use VisualPRM to select the best response.

Hello, thank you for your guidance. However, I have some doubts regarding the inference code you mentioned. I am using the VisualPRM model as the reward_model_path in the code vlmeval/vlm/internvl/internvl_chat.py, and I would like to know if simply setting the best_of_n parameter to 8 is sufficient?

Do I need to define a specific scoring strategy for the reward model?

May 12 '25 16:05 YuanDaoze

No, you can use the default strategy to reproduce our results.

May 13 '25 03:05 Weiyun1025

No, you can use the default strategy to reproduce our results.

Thank you for your timely reply. I will try to reproduce the results of the BoN experiments.

And I am also confused about the select_best_response in the overall code , as I am not sure where it is defined. I believe this might be the key for VisualPRM to select the best answer and will help me understand how VisualPRM works.

if self.best_of_n > 1:
    response_list = self.reward_model.select_best_response(
        tokenizer=self.reward_tokenizer,
        question=prompt,
        response_list=response_list,
        pixel_values=pixel_values,
        num_patches_list=num_patches_list,
    )
response = response_list[0]

May 13 '25 06:05 YuanDaoze

No, you can use the default strategy to reproduce our results.

我在复现VisualPRM，具作者汇报是使用VLMEvalKit项目做的inference和eval，但是我在实际测试中遇到问题。

我的设备如下：6卡A100 / 40GB；dataset：MMLU_DEV

1.使用CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 torchrun --nproc-per-node=6

每张卡实例化一个模型做测试，40GB模型在实例化两个模型（InternVL2_5-8B-BoN-8： InternVL2_5-8B/ VisualPRM-8B-v1_1）后总会在一张卡运行一半报错OOM，导致其余的卡无法完成后合并，直到等待超时。

2.使用CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 torchrun --nproc-per-node=3

每2张卡实例化一个模型做测试，报错：

Traceback (most recent call last):
  File "/home/u22202140/xt/Reward_Model/VLMEvalKit/run.py", line 349, in main
    model = infer_data_job(
  File "/home/u22202140/xt/Reward_Model/VLMEvalKit/vlmeval/inference.py", line 191, in infer_data_job
    model = infer_data(
  File "/home/u22202140/xt/Reward_Model/VLMEvalKit/vlmeval/inference.py", line 154, in infer_data
    response = model.generate(message=struct, dataset=dataset_name)
  File "/home/u22202140/xt/Reward_Model/VLMEvalKit/vlmeval/vlm/base.py", line 116, in generate
    return self.generate_inner(message, dataset)
  File "/home/u22202140/xt/Reward_Model/VLMEvalKit/vlmeval/vlm/internvl/internvl_chat.py", line 436, in generate_inner
    return self.generate_v2(message, dataset)
  File "/home/u22202140/anaconda3/envs/xt_MLLM/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/u22202140/xt/Reward_Model/VLMEvalKit/vlmeval/vlm/internvl/internvl_chat.py", line 403, in generate_v2
    response_list = self.reward_model.select_best_response(
  File "/home/u22202140/anaconda3/envs/xt_MLLM/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/u22202140/.cache/huggingface/modules/transformers_modules/VisualPRM-8B-v1_1/modeling_internvl_chat.py", line 519, in select_best_response
    steps_with_score = self.generate_steps_with_soft_score(
  File "/home/u22202140/.cache/huggingface/modules/transformers_modules/VisualPRM-8B-v1_1/modeling_internvl_chat.py", line 472, in generate_steps_with_soft_score
    logits = self(
  File "/home/u22202140/anaconda3/envs/xt_MLLM/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/u22202140/anaconda3/envs/xt_MLLM/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/u22202140/anaconda3/envs/xt_MLLM/lib/python3.10/site-packages/accelerate/hooks.py", line 176, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/u22202140/.cache/huggingface/modules/transformers_modules/VisualPRM-8B-v1_1/modeling_internvl_chat.py", line 110, in forward
    vit_embeds = vit_embeds[image_flags == 1]
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cuda:1)
^CW0517 22:44:37.587000 71565 site-packages/torch/distributed/elastic/agent/server/api.py:704] Received Signals.SIGINT death signal, shutting down workers

主要是张量分布存在问题，但是这个报错地方modeling_internvl_chat.py此代码是HF远程拉取的，无法做修改，无法统一到一个设备。

3.使用CUDA_VISIBLE_DEVICES=0 python run.py

单卡测试，运行过程中突然OOM

4.使用CUDA_VISIBLE_DEVICES=0,1 python run.py 多卡测试

与情形2同样的报错。

May 17 '25 16:05 YuanDaoze