OpenRLHF icon indicating copy to clipboard operation
OpenRLHF copied to clipboard

An Easy-to-use, Scalable and High-performance RLHF Framework (Support 70B+ full tuning & LoRA & Mixtral & KTO)

Results 42 OpenRLHF issues
Sort by recently updated
recently updated
newest added

Hi Team, It would be great if kuberay commands to run openrlhf is added in the docs ,to make the cold start easier to set it up

documentation
enhancement
P0

When I set `save_step` other than -1, the program outputs an exception ``` self.actor.model, os.path.join(args.ckpt_path, "_actor"), tag, args.max_ckpt_num, args.max_ckpt_mem AttributeError: 'Namespace' object has no attribute 'ckpt_path' ``` https://github.com/OpenLLMAI/OpenRLHF/blob/3c918755faa31ee810f3624a82ba5f7879e4f8d3/openrlhf/trainer/ppo_trainer.py#L378-L385 These three...

训练配置如下: ``` --ref_num_nodes 1 --ref_num_gpus_per_node 2 --reward_num_nodes 1 --reward_num_gpus_per_node 2 --critic_num_nodes 1 --critic_num_gpus_per_node 4 --actor_num_nodes 2 --actor_num_gpus_per_node 8 --vllm_num_engines 2 --vllm_tensor_parallel_size 4 --micro_train_batch_size 4 --train_batch_size 64 --micro_rollout_batch_size 4 --rollout_batch_size 64...

enhancement
help wanted

Is it considered a new feature ORPO? ORPO, a technique that replaces SFT+DPO/PPO was released recently. I saw @_philschmid's post regarding it yesterday. Gave ORPO a shot with phi-2 and...

`RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cpu! ` several codes make ema model on gpu device: ``` if...

From the code, the tokenizer of reward model seems to be same as policy model?

I see in `RemoteExperienceMaker._generate_vllm()`, [line 375](https://github.com/OpenLLMAI/OpenRLHF/blob/4e15591a5abd19a14e4a72415603bce76c3e1567/openrlhf/trainer/ppo_utils/experience_maker.py#L375) that for generations that don't finish, i.e. don't output the EOS tokens within the max token limit, we manually set the last token to...