Can you open source the code of VisualPRM-8B?
Thanks!
Thank you for your interest in our work. We have released the model and training data of VisualPRM. The model code is also released in HuggingFace. The training code of VisualPRM is the same with the fine-tuning code of InternVL. You can refer to our fine-tuning document and set the data path as our VisualPRM400K.
BTW, we evaluate our model with VLMEvalkit, you can refer to the evaluation code for how to use VisualPRM to select the best response.
BTW, we evaluate our model with VLMEvalkit, you can refer to the evaluation code for how to use VisualPRM to select the best response.
Hello, thank you for your guidance. However, I have some doubts regarding the inference code you mentioned. I am using the VisualPRM model as the reward_model_path in the code vlmeval/vlm/internvl/internvl_chat.py, and I would like to know if simply setting the best_of_n parameter to 8 is sufficient?
Do I need to define a specific scoring strategy for the reward model?
No, you can use the default strategy to reproduce our results.
No, you can use the default strategy to reproduce our results.
Thank you for your timely reply. I will try to reproduce the results of the BoN experiments.
And I am also confused about the select_best_response in the overall code , as I am not sure where it is defined. I believe this might be the key for VisualPRM to select the best answer and will help me understand how VisualPRM works.
if self.best_of_n > 1:
response_list = self.reward_model.select_best_response(
tokenizer=self.reward_tokenizer,
question=prompt,
response_list=response_list,
pixel_values=pixel_values,
num_patches_list=num_patches_list,
)
response = response_list[0]
No, you can use the default strategy to reproduce our results.
我在复现VisualPRM,具作者汇报是使用VLMEvalKit项目做的inference和eval,但是我在实际测试中遇到问题。
我的设备如下:6卡A100 / 40GB;dataset:MMLU_DEV
1.使用CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 torchrun --nproc-per-node=6
每张卡实例化一个模型做测试,40GB模型在实例化两个模型(InternVL2_5-8B-BoN-8: InternVL2_5-8B/ VisualPRM-8B-v1_1)后总会在一张卡运行一半报错OOM,导致其余的卡无法完成后合并,直到等待超时。
2.使用CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 torchrun --nproc-per-node=3
每2张卡实例化一个模型做测试,报错:
Traceback (most recent call last):
File "/home/u22202140/xt/Reward_Model/VLMEvalKit/run.py", line 349, in main
model = infer_data_job(
File "/home/u22202140/xt/Reward_Model/VLMEvalKit/vlmeval/inference.py", line 191, in infer_data_job
model = infer_data(
File "/home/u22202140/xt/Reward_Model/VLMEvalKit/vlmeval/inference.py", line 154, in infer_data
response = model.generate(message=struct, dataset=dataset_name)
File "/home/u22202140/xt/Reward_Model/VLMEvalKit/vlmeval/vlm/base.py", line 116, in generate
return self.generate_inner(message, dataset)
File "/home/u22202140/xt/Reward_Model/VLMEvalKit/vlmeval/vlm/internvl/internvl_chat.py", line 436, in generate_inner
return self.generate_v2(message, dataset)
File "/home/u22202140/anaconda3/envs/xt_MLLM/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/u22202140/xt/Reward_Model/VLMEvalKit/vlmeval/vlm/internvl/internvl_chat.py", line 403, in generate_v2
response_list = self.reward_model.select_best_response(
File "/home/u22202140/anaconda3/envs/xt_MLLM/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/u22202140/.cache/huggingface/modules/transformers_modules/VisualPRM-8B-v1_1/modeling_internvl_chat.py", line 519, in select_best_response
steps_with_score = self.generate_steps_with_soft_score(
File "/home/u22202140/.cache/huggingface/modules/transformers_modules/VisualPRM-8B-v1_1/modeling_internvl_chat.py", line 472, in generate_steps_with_soft_score
logits = self(
File "/home/u22202140/anaconda3/envs/xt_MLLM/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/u22202140/anaconda3/envs/xt_MLLM/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/u22202140/anaconda3/envs/xt_MLLM/lib/python3.10/site-packages/accelerate/hooks.py", line 176, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/u22202140/.cache/huggingface/modules/transformers_modules/VisualPRM-8B-v1_1/modeling_internvl_chat.py", line 110, in forward
vit_embeds = vit_embeds[image_flags == 1]
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cuda:1)
^CW0517 22:44:37.587000 71565 site-packages/torch/distributed/elastic/agent/server/api.py:704] Received Signals.SIGINT death signal, shutting down workers
主要是张量分布存在问题,但是这个报错地方modeling_internvl_chat.py此代码是HF远程拉取的,无法做修改,无法统一到一个设备。
3.使用CUDA_VISIBLE_DEVICES=0 python run.py
单卡测试,运行过程中突然OOM
4.使用CUDA_VISIBLE_DEVICES=0,1 python run.py 多卡测试
与情形2同样的报错。