OpenRLHF
OpenRLHF copied to clipboard
An Easy-to-use, Scalable and High-performance RLHF Framework (Support 70B+ full tuning & LoRA & Mixtral & KTO)
I want to use a 70b parameter model as my reward model. It is inefficient to load such model from pretrained and ideally should be queried through an api. However,...
Hi team getting the following error while enabling 4-bit and LORA ``` File "/root/miniconda3/envs/open/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 262, in __init__ self._configure_distributed_model(model) File "/root/miniconda3/envs/open/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1112, in _configure_distributed_model self.module.to(self.device) File "/root/miniconda3/envs/open/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2670,...
I have a use case where I'd like to use a custom `ExperienceMaker` class instead of either of the provided ones. As far as I can tell, there isn't currently...
脚本修改如下,ckpt换为Qwen:
I am trying to apply RLHF on a text classification task. You can imagine the text classification model i.e. policy model here is `emotion classification`. The pretrained model can output...
**What happened + What you expected to happen:** **Operation process:** `ray start --head --node-ip-address 0.0.0.0 --num-gpus 8` **Success start head:** > Usage stats collection is enabled. To disable this, add...
I used a large model (> 170B) as my reward model. In the very beginning, loss is normal. But when training one step, the loss becomes NAN. This situation didn't...