OpenRLHF
OpenRLHF copied to clipboard
An Easy-to-use, Scalable and High-performance RLHF Framework (Support 70B+ full tuning & LoRA & Mixtral & KTO)
Hi Team, It would be great if kuberay commands to run openrlhf is added in the docs ,to make the cold start easier to set it up
When I set `save_step` other than -1, the program outputs an exception ``` self.actor.model, os.path.join(args.ckpt_path, "_actor"), tag, args.max_ckpt_num, args.max_ckpt_mem AttributeError: 'Namespace' object has no attribute 'ckpt_path' ``` https://github.com/OpenLLMAI/OpenRLHF/blob/3c918755faa31ee810f3624a82ba5f7879e4f8d3/openrlhf/trainer/ppo_trainer.py#L378-L385 These three...
训练配置如下: ``` --ref_num_nodes 1 --ref_num_gpus_per_node 2 --reward_num_nodes 1 --reward_num_gpus_per_node 2 --critic_num_nodes 1 --critic_num_gpus_per_node 4 --actor_num_nodes 2 --actor_num_gpus_per_node 8 --vllm_num_engines 2 --vllm_tensor_parallel_size 4 --micro_train_batch_size 4 --train_batch_size 64 --micro_rollout_batch_size 4 --rollout_batch_size 64...
Is it considered a new feature ORPO? ORPO, a technique that replaces SFT+DPO/PPO was released recently. I saw @_philschmid's post regarding it yesterday. Gave ORPO a shot with phi-2 and...
single node deepspeed launch
`RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cpu! ` several codes make ema model on gpu device: ``` if...
From the code, the tokenizer of reward model seems to be same as policy model?
I see in `RemoteExperienceMaker._generate_vllm()`, [line 375](https://github.com/OpenLLMAI/OpenRLHF/blob/4e15591a5abd19a14e4a72415603bce76c3e1567/openrlhf/trainer/ppo_utils/experience_maker.py#L375) that for generations that don't finish, i.e. don't output the EOS tokens within the max token limit, we manually set the last token to...