LLaMA-Factory Reward Model 推理

Reminder

[x] I have read the above rules and searched the existing issues.

System Info

有没有具体的 reward model 在训练完成后的推理的示例。需要什么样的数据，用哪个指令可以进行rm的推理？我现在将lora模型加载后，计算的score不知道是不是正确。

Reproduction

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(model_path_merge_rm, device_map="cpu")
model = AutoModelForCausalLMWithValueHead.from_pretrained(model)
vhead_params = load_valuehead_params(vhead_file)
model.load_state_dict(vhead_params, strict=False)

_, _, values = model(**inputs, output_hidden_states=True, return_dict=True, use_cache=False)
rewards = values.gather(dim=-1, index=(inputs["attention_mask"].sum(dim=-1, keepdim=True) - 1))

Others

No response

Mar 07 '25 12:03 SFTJBD

是不是可以直接使用values[0][-1]？

Mar 17 '25 04:03 LLLeoLi

我还想知道怎么用vllm部署推理加速

Apr 08 '25 06:04 QUNING1

解决了吗

Jul 16 '25 02:07 lzp-man

请问方便提供一下完整的reward model推理代码吗，我也有同样的问题

Aug 07 '25 09:08 jinzhuoran