verl icon indicating copy to clipboard operation
verl copied to clipboard

How to register a new reward manager

Open jiangzizi opened this issue 3 weeks ago • 2 comments

System Info

background environment

python version 3.12.12 verl version 0.7.0.dev0 vllm version 0.11.0

Information

  • [ ] The official example scripts
  • [x] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

Problem

I tried to register a custom reward manager following existing reward manager example. To be specific, I made the following code change.

I make a new manager in verl/workers/reward_manager/messy.py

@register("messy")
class MessyRewardManager(AbstractRewardManager):
    """The reward manager."""

    def __init__(self, tokenizer, num_examine, compute_score=None, reward_fn_key="data_source") -> None:
        """
        Initialize the MessyRewardManager instance.

I add it to the verl/workers/reward_manager/init.py

# Copyright 2024 PRIME team and/or its affiliates
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from .registry import get_reward_manager_cls, register  # noqa: I001
from .batch import BatchRewardManager
from .dapo import DAPORewardManager
from .naive import NaiveRewardManager
from .prime import PrimeRewardManager
from .messy import MessyRewardManager

# Note(haibin.lin): no need to include all reward managers here in case of complicated dependencies
__all__ = [
    "BatchRewardManager",
    "DAPORewardManager",
    "NaiveRewardManager",
    "PrimeRewardManager",
    "MessyRewardManager",
    "register",
    "get_reward_manager_cls",
]

# Import experimental reward managers to ensure they are registered
try:
    from verl.experimental.reward.reward_loop.limited import RateLimitedRewardLoopManager  # noqa: F401

    __all__.append("RateLimitedRewardLoopManager")
except ImportError:
    pass  # Optional dependency, may not be available

But I met the following error:

ManagerWorker.__init__() (pid=819859, ip=192.168.1.7, actor_id=d077f630a9ea358f8e49d71e01000000, repr=<verl.experimental.reward.reward_manager.RewardManagerWorker object at 0x7faf7dc822d0>) [repeated 7x across cluster]
(TaskRunner pid=810440)   File "/home/jiangdazhi/miniconda3/envs/verl/lib/python3.12/concurrent/futures/_base.py", line 456, in result [repeated 21x across cluster]
(TaskRunner pid=810440)     return self.__get_result() [repeated 21x across cluster]
(TaskRunner pid=810440)            ^^^^^^^^^^^^^^^^^^^ [repeated 21x across cluster]
(TaskRunner pid=810440)   File "/home/jiangdazhi/miniconda3/envs/verl/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result [repeated 21x across cluster]
(TaskRunner pid=810440)     raise self._exception [repeated 21x across cluster]
(TaskRunner pid=810440)            ^^^^^^^^^^^^^^^^^^^^^ [repeated 14x across cluster]
(TaskRunner pid=810440)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [repeated 14x across cluster]
(TaskRunner pid=810440)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [repeated 28x across cluster]
(TaskRunner pid=810440)   File "/home/disk1/jiangdazhi/code/research/verl/verl/experimental/reward/reward_manager.py", line 36, in __init__ [repeated 14x across cluster]
(TaskRunner pid=810440)     self._init_reward_fn() [repeated 14x across cluster]
(TaskRunner pid=810440)   File "/home/disk1/jiangdazhi/code/research/verl/verl/experimental/reward/reward_manager.py", line 48, in _init_reward_fn [repeated 14x across cluster]
(TaskRunner pid=810440)     reward_loop_manager_cls = get_reward_loop_manager_cls(self.config.reward_model.reward_manager) [repeated 14x across cluster]
(TaskRunner pid=810440)                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [repeated 14x across cluster]
(TaskRunner pid=810440)   File "/home/disk1/jiangdazhi/code/research/verl/verl/experimental/reward/reward_loop/registry.py", line 54, in get_reward_loop_manager_cls [repeated 14x across cluster]
(TaskRunner pid=810440)     raise ValueError(f"Unknown reward loop manager: {name}") [repeated 14x across cluster]
(TaskRunner pid=810440) ValueError: Unknown reward loop manager: messy [repeated 14x across cluster]

Can anyone guide me? Thank you!

Expected behavior

Should work as other reward manager.

jiangzizi avatar Dec 02 '25 14:12 jiangzizi

This might be because the current Verl uses async mode by default, while the current agent loop uses the /verl/experimental/reward/reward_loop. Perhaps you could set actor_rollout_ref.rollout.mode=sync to use the default RewardManager or add it in /verl/experimental/reward/reward_loop.

ChenyuWang1022 avatar Dec 03 '25 12:12 ChenyuWang1022

You could try to define a custom reward function and register the reward manager there. That works for me. So in your custom_reward_function.path script, you do the .register()

cdm233 avatar Dec 03 '25 16:12 cdm233

@jiangzizi which registration function are you using?

from verl.experimental.reward.reward_loop.registry import register
# or
from verl.workers.reward_manager import register

The error above expected you to use the former regsitry. What trainer class are you using, as well?

garrett361 avatar Dec 04 '25 22:12 garrett361

I met the same problem,i just want to use api to give reward, i used batch in early vision, it worked. But now i set reward_model.reward_manager=batch , errror occurred ,ValueError: Unknown reward loop manager: batch

SupreCyk avatar Dec 15 '25 08:12 SupreCyk