How to register a new reward manager
System Info
background environment
python version 3.12.12 verl version 0.7.0.dev0 vllm version 0.11.0
Information
- [ ] The official example scripts
- [x] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
Problem
I tried to register a custom reward manager following existing reward manager example. To be specific, I made the following code change.
I make a new manager in verl/workers/reward_manager/messy.py
@register("messy")
class MessyRewardManager(AbstractRewardManager):
"""The reward manager."""
def __init__(self, tokenizer, num_examine, compute_score=None, reward_fn_key="data_source") -> None:
"""
Initialize the MessyRewardManager instance.
I add it to the verl/workers/reward_manager/init.py
# Copyright 2024 PRIME team and/or its affiliates
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .registry import get_reward_manager_cls, register # noqa: I001
from .batch import BatchRewardManager
from .dapo import DAPORewardManager
from .naive import NaiveRewardManager
from .prime import PrimeRewardManager
from .messy import MessyRewardManager
# Note(haibin.lin): no need to include all reward managers here in case of complicated dependencies
__all__ = [
"BatchRewardManager",
"DAPORewardManager",
"NaiveRewardManager",
"PrimeRewardManager",
"MessyRewardManager",
"register",
"get_reward_manager_cls",
]
# Import experimental reward managers to ensure they are registered
try:
from verl.experimental.reward.reward_loop.limited import RateLimitedRewardLoopManager # noqa: F401
__all__.append("RateLimitedRewardLoopManager")
except ImportError:
pass # Optional dependency, may not be available
But I met the following error:
ManagerWorker.__init__() (pid=819859, ip=192.168.1.7, actor_id=d077f630a9ea358f8e49d71e01000000, repr=<verl.experimental.reward.reward_manager.RewardManagerWorker object at 0x7faf7dc822d0>) [repeated 7x across cluster]
(TaskRunner pid=810440) File "/home/jiangdazhi/miniconda3/envs/verl/lib/python3.12/concurrent/futures/_base.py", line 456, in result [repeated 21x across cluster]
(TaskRunner pid=810440) return self.__get_result() [repeated 21x across cluster]
(TaskRunner pid=810440) ^^^^^^^^^^^^^^^^^^^ [repeated 21x across cluster]
(TaskRunner pid=810440) File "/home/jiangdazhi/miniconda3/envs/verl/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result [repeated 21x across cluster]
(TaskRunner pid=810440) raise self._exception [repeated 21x across cluster]
(TaskRunner pid=810440) ^^^^^^^^^^^^^^^^^^^^^ [repeated 14x across cluster]
(TaskRunner pid=810440) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [repeated 14x across cluster]
(TaskRunner pid=810440) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [repeated 28x across cluster]
(TaskRunner pid=810440) File "/home/disk1/jiangdazhi/code/research/verl/verl/experimental/reward/reward_manager.py", line 36, in __init__ [repeated 14x across cluster]
(TaskRunner pid=810440) self._init_reward_fn() [repeated 14x across cluster]
(TaskRunner pid=810440) File "/home/disk1/jiangdazhi/code/research/verl/verl/experimental/reward/reward_manager.py", line 48, in _init_reward_fn [repeated 14x across cluster]
(TaskRunner pid=810440) reward_loop_manager_cls = get_reward_loop_manager_cls(self.config.reward_model.reward_manager) [repeated 14x across cluster]
(TaskRunner pid=810440) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [repeated 14x across cluster]
(TaskRunner pid=810440) File "/home/disk1/jiangdazhi/code/research/verl/verl/experimental/reward/reward_loop/registry.py", line 54, in get_reward_loop_manager_cls [repeated 14x across cluster]
(TaskRunner pid=810440) raise ValueError(f"Unknown reward loop manager: {name}") [repeated 14x across cluster]
(TaskRunner pid=810440) ValueError: Unknown reward loop manager: messy [repeated 14x across cluster]
Can anyone guide me? Thank you!
Expected behavior
Should work as other reward manager.
This might be because the current Verl uses async mode by default, while the current agent loop uses the /verl/experimental/reward/reward_loop. Perhaps you could set actor_rollout_ref.rollout.mode=sync to use the default RewardManager or add it in /verl/experimental/reward/reward_loop.
You could try to define a custom reward function and register the reward manager there. That works for me. So in your custom_reward_function.path script, you do the .register()
@jiangzizi which registration function are you using?
from verl.experimental.reward.reward_loop.registry import register
# or
from verl.workers.reward_manager import register
The error above expected you to use the former regsitry. What trainer class are you using, as well?
I met the same problem,i just want to use api to give reward, i used batch in early vision, it worked. But now i set reward_model.reward_manager=batch , errror occurred ,ValueError: Unknown reward loop manager: batch