verl How to add a llm as reward model?

I want to use a local llm as reward model, but verl only support rule base and hf seqclassification model. I have tried to use vllm bulid a offline server, however I do not kown how to add this server to verl.

May 19 '25 01:05 zyf8818

same here

May 19 '25 02:05 yantijin

@yantijin I want to use vllm for batch inference, API is two slow for generate hundreds of responses, do you consider that?

May 19 '25 03:05 zyf8818

same here

Jun 10 '25 03:06 llp1992

same here

Jul 01 '25 08:07 sweartoyou

You can define a custom compute_score in a python file, for example llm_score.py

import requests

def compute_score(data_source, solution_str, ground_truth, extra_info):
    resp = requests.post("http://localhost:8000/score", json={
        "data_source": data_source,
        "solution_str": solution_str,
        "ground_truth": ground_truth,
        "extra_info": extra_info
    })
    return resp.json()["score"]

Then in training script, specify it by

python3 -m verl.trainer.main_ppo \
    ...
    custom_reward_function.path=/path/to/llm_score.py \
    custom_reward_function.name=compute_score \
    ...

Jul 21 '25 11:07 wuxibin89

You can define a custom compute_score in a python file, for example llm_score.py

import requests

def compute_score(data_source, solution_str, ground_truth, extra_info): resp = requests.post("http://localhost:8000/score", json={ "data_source": data_source, "solution_str": solution_str, "ground_truth": ground_truth, "extra_info": extra_info }) return resp.json()["score"] Then in training script, specify it by

python3 -m verl.trainer.main_ppo
... custom_reward_function.path=/path/to/llm_score.py
custom_reward_function.name=compute_score
...

多谢多谢！使用vllm在另一张卡上部署一个API服务？没试过这种方法，调用效率如何？

Jul 22 '25 08:07 zyf8818

same here

Nov 06 '25 16:11 JoeyJyL