OpenRLHF issues

feature: add api support for hosting a reward model

5

I want to use a 70b parameter model as my reward model. It is inefficient to load such model from pretrained and ideally should be queried through an api. However,...

ftmtk

enhancement

P1

Add pipeline module to support more scientific comparative experiments and research

catqaq

enhancement

P1

Add better docs and usage examples

2

hijkzzz

documentation

enhancement

Hi team getting the following error while enabling 4-bit and LORA ``` File "/root/miniconda3/envs/open/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 262, in __init__ self._configure_distributed_model(model) File "/root/miniconda3/envs/open/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1112, in _configure_distributed_model self.module.to(self.device) File "/root/miniconda3/envs/open/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2670,...

karthik-nexusflow

Custom ExperienceMaker

4

I have a use case where I'd like to use a custom `ExperienceMaker` class instead of either of the provided ones. As far as I can tell, there isn't currently...

mgerstgrasser

PPO采用zero 3 stage后产生time out error

1

脚本修改如下，ckpt换为Qwen：

victorShawFan

RLHF for classification tasks

1

I am trying to apply RLHF on a text classification task. You can imagine the text classification model i.e. policy model here is `emotion classification`. The pretrained model can output...

vinodrajendran001

HTTPError when running train_ppo_llama_ray.sh

5

**What happened + What you expected to happen:** **Operation process:** `ray start --head --node-ip-address 0.0.0.0 --num-gpus 8` **Success start head：** > Usage stats collection is enabled. To disable this, add...

Zeyuan-Liu

RM training loss becomes NAN when finish the first training step.

1

I used a large model (> 170B) as my reward model. In the very beginning, loss is normal. But when training one step, the loss becomes NAN. This situation didn't...

lixsh6

OpenRLHF
OpenRLHF copied to clipboard

Metadata

feature: add api support for hosting a reward model

Add pipeline module to support more scientific comparative experiments and research

Implement Re-max

Add better docs and usage examples

QLORA model loading error

Custom ExperienceMaker

PPO采用zero 3 stage后产生time out error

RLHF for classification tasks

HTTPError when running train_ppo_llama_ray.sh

RM training loss becomes NAN when finish the first training step.

← Metadata

Owner

Metadata

OpenRLHF OpenRLHF copied to clipboard

Metadata

← Metadata

Owner

Metadata

OpenRLHF
OpenRLHF copied to clipboard