verl
verl copied to clipboard
[Question] How to call reward model in rule based reward function?
Could it be possible for me to call a generative model, e.g. locally deployed model or remotely called API model in reward functions?
I want to use Qwen model to generate a reward after ####
, what is the best practice to do it? I note that directly call model in reward function may face severe efficency problem.
Thank you, any replies will be helpful!