OpenRLHF icon indicating copy to clipboard operation
OpenRLHF copied to clipboard

An Easy-to-use, Scalable and High-performance RLHF Framework (Support 70B+ full tuning & LoRA & Mixtral & KTO)

Results 42 OpenRLHF issues
Sort by recently updated
recently updated
newest added

Hi Team, While using the PPO pipeline we observe at times spikes in response length and were curious if any techniques related to length penalty is available or explored

If I understand the current PPO code correctly, this instantiates completely separate actor and critic models, without any layers shared between them. (But correct me in case that is wrong?)...

I noticed that RemoteExperienceMaker left-pads the input sequences even when using vllm for generation: https://github.com/OpenLLMAI/OpenRLHF/blob/dcd379a44eea56625626d1a0832cd3eeda048b21/openrlhf/trainer/ppo_utils/experience_maker.py#L346 I can see that a few lines down,`self.actor.process_sequences()` assumes this left-padding, as it calculates an...

Hi I notice you cite "70B+ Full Tuning with 16 A100" however this is also something that trlX (and that we worked very hard to add ;) ) supports via...

model list: 1. deepseek 2. Gemma

enhancement

Hi! Thanks for your work on OpenRLHF. I trained a 4-bit Qwen-based reward model with this config (see the defaults): ``` parser.add_argument("--pretrain", type=str, default="Qwen/Qwen1.5-7B") parser.add_argument('--dataset', type=str, default='Anthropic/hh-rlhf') parser.add_argument("--dataset", type=str, default="nz/highest-number-rlhf")...

Supports pip install and pre-build containers, then all functions support one-click training by passing args.

I have some thoughts about using vLLM for generation. Feel free to correct me if I were wrong. 1. Batching It seems that prompts are still passing to vllm engines...

help wanted