reward-bench issues

Need to get legal approval to use Gemini API, but try this https://openrouter.ai/docs#quick-start

enhancement

Add multi-gpu inference option

2

Currently `run_rm.py` only uses one RM because RMs are not well supported generally for inference. Current implementation is a separate `run_rm_mpgu.py` script. We can delete this and improve the base...

natolambert

The metrics of gemma-2-27b-it is not matched with the metrics on leaderboard

1

I evaluated gemma-2-27b-it {'Chat': 0.8938547486033519, 'Chat Hard': 0.6085526315789473, 'Safety': 0.8867946647946647, 'Reasoning': 0.7705588066786708} while ![image](https://github.com/user-attachments/assets/2c6fb743-5e59-4865-9e36-bd330dc4a591)

ToSev7en

Potential Duplication of BOS Token

I noticed that, for default (sequence classification) models with chat template defined in the tokenizer, `scripts/run_rm.py` formats each conversation by `tokenizer.apply_chat_template` (via the function [`prepare_dialogue_from_tokenizer`](https://github.com/allenai/reward-bench/blob/bc72fb2a573fc31c614eef3405d354b398977b02/rewardbench/utils.py#L515)) and then uses the text...

chrisliu298

small gen pr

natolambert

Add a new reward model and make some modifications to reward-bench code

4

Hi RewardBench Team, We have updated a 8B reward model (Custom Classifier) [general-preference/GPM-Llama-3.1-8B](https://huggingface.co/general-preference/GPM-Llama-3.1-8B) and a 2b reward model (Custom Classifier) [general-preference/GPM-Gemma-2B](https://huggingface.co/general-preference/GPM-Gemma-2B). Local evaluation results for our models are listed as...

kirigayahitsugi

Best Practice for Including Reward-Bench with Local Modifications in Our Repo

4

Hi Nathan, I’m currently preparing to release a new repository that contains the code used in my paper. As part of our experiments, we made some slight modifications to the...

hank0316

Add Cloud RMs

See @zankner's repo https://github.com/zankner/CLoud, RM's that think out loud!

natolambert

New Model

Add Athene reward model

Unfortunately also fails with: ``` Traceback (most recent call last): File "/weka/oe-adapt-default/nathanl/reward-bench/scripts/run_rm.py", line 401, in main() File "/weka/oe-adapt-default/nathanl/reward-bench/scripts/run_rm.py", line 309, in main rewards_chosen = reward_pipe(batch["text_chosen"], **reward_pipeline_kwargs) TypeError: list indices must...

natolambert

codex

reward-bench
reward-bench copied to clipboard

Metadata

Logging some new models

Set up OpenRouter for llm-as-a-judge

Add multi-gpu inference option

The metrics of gemma-2-27b-it is not matched with the metrics on leaderboard

Potential Duplication of BOS Token

small gen pr

Add a new reward model and make some modifications to reward-bench code

Best Practice for Including Reward-Bench with Local Modifications in Our Repo

Add Cloud RMs

Add Athene reward model

← Metadata

Owner

Metadata

reward-bench reward-bench copied to clipboard

Metadata

← Metadata

Owner

Metadata

reward-bench
reward-bench copied to clipboard