OpenRLHF icon indicating copy to clipboard operation
OpenRLHF copied to clipboard

An Easy-to-use, Scalable and High-performance RLHF Framework (Support 70B+ full tuning & LoRA & Mixtral & KTO)

Results 42 OpenRLHF issues
Sort by recently updated
recently updated
newest added

Thanks for the wonderful project! Does OpenRLHF support the use of the latest version of vllm and newer NGC (like 24.02), as vllm now is updated to v0.4.1 currently? Or...

使用PPO训练13B的模型,内存占用特别高,我应该怎么解决

``` deepspeed ./train_ppo.py \ --pretrain OpenLLMAI/Llama-2-7b-sft-model-ocra-500k \ --reward_pretrain OpenLLMAI/Llama-2-7b-rm-anthropic_hh-lmsys-oasst-webgpt \ --save_path ./ckpt/7b_llama \ --save_steps -1 \ --logging_steps 1 \ --eval_steps -1 \ --micro_train_batch_size 2 \ --train_batch_size 128 \ --micro_rollout_batch_size 4...

Team, thank you so much for this wonderful toolkit! we are trying to test the vllm setting with mistralai/Mistral-7B-Instruct-v0.2 model with zero2 ![image](https://github.com/OpenLLMAI/OpenRLHF/assets/35610230/b97439b6-ee2f-4598-9134-74ec075b9ef5) ray job submit --address="http://127.0.0.1:8265" \ --runtime-env-json='{"working_dir": "/openrlhf",...

Hello, I want to run train_ppo_llama_ray.sh on 4 RTX4090, should I modify the actor_num_gpus_per_node/critic_num_gpus_per_node in train_ppo_llama_ray.sh ? As the default script is for 8 gpus, what else should I pay...

我在模型微调的时候加入了代码数据集,让模型拥有不错的代码能力,在RLHF阶段训练奖励模型的时候还需要再加入代码数据集的训练吗,如果不加入会不会导致模型的代码能力下降

documentation

Hello, I want to run train_ppo_llama.sh on 4 A100 80G. Do I need to reconfigure the GPU allocation of the 4 models?

I think this issue relates to #217 and #218. Some models, for example `facebook/opt-1.3b`, don't accept `position_ids` as an argument and current implementation using it (used [here](https://github.com/OpenLLMAI/OpenRLHF/blob/9f1707201d8a8f40ece44abd6008c9ba56d02cb8/openrlhf/models/actor.py#L176), [here](https://github.com/OpenLLMAI/OpenRLHF/blob/9f1707201d8a8f40ece44abd6008c9ba56d02cb8/openrlhf/models/model.py#L201) and [here](https://github.com/OpenLLMAI/OpenRLHF/blob/9f1707201d8a8f40ece44abd6008c9ba56d02cb8/openrlhf/models/model.py#L262))...