How to add a llm as reward model?
I want to use a local llm as reward model, but verl only support rule base and hf seqclassification model. I have tried to use vllm bulid a offline server, however I do not kown how to add this server to verl.
same here
@yantijin I want to use vllm for batch inference, API is two slow for generate hundreds of responses, do you consider that?
same here
same here
You can define a custom compute_score in a python file, for example llm_score.py
import requests
def compute_score(data_source, solution_str, ground_truth, extra_info):
resp = requests.post("http://localhost:8000/score", json={
"data_source": data_source,
"solution_str": solution_str,
"ground_truth": ground_truth,
"extra_info": extra_info
})
return resp.json()["score"]
Then in training script, specify it by
python3 -m verl.trainer.main_ppo \
...
custom_reward_function.path=/path/to/llm_score.py \
custom_reward_function.name=compute_score \
...
You can define a custom
compute_scorein a python file, for example llm_score.pyimport requests
def compute_score(data_source, solution_str, ground_truth, extra_info): resp = requests.post("http://localhost:8000/score", json={ "data_source": data_source, "solution_str": solution_str, "ground_truth": ground_truth, "extra_info": extra_info }) return resp.json()["score"] Then in training script, specify it by
python3 -m verl.trainer.main_ppo
... custom_reward_function.path=/path/to/llm_score.py
custom_reward_function.name=compute_score
...
多谢多谢! 使用vllm在另一张卡上部署一个API服务?没试过这种方法,调用效率如何?
same here