OpenRLHF issues

Documentation for using Kuberay

4

Hi Team, It would be great if kuberay commands to run openrlhf is added in the docs ,to make the cold start easier to set it up

karthik-nexusflow

add test pipeline: use small LLM and small data

catqaq

documentation

enhancement

add perf and benchmark scripts

3

wuxibin89

documentation

enhancement

P0

Is save checkpoint not yet supported for ppo ray trainer?

5

When I set `save_step` other than -1, the program outputs an exception ``` self.actor.model, os.path.join(args.ckpt_path, "_actor"), tag, args.max_ckpt_num, args.max_ckpt_mem AttributeError: 'Namespace' object has no attribute 'ckpt_path' ``` https://github.com/OpenLLMAI/OpenRLHF/blob/3c918755faa31ee810f3624a82ba5f7879e4f8d3/openrlhf/trainer/ppo_trainer.py#L378-L385 These three...

mickel-liu

Unexpected long actor_time when train_ppo_ray

9

训练配置如下： ``` --ref_num_nodes 1 --ref_num_gpus_per_node 2 --reward_num_nodes 1 --reward_num_gpus_per_node 2 --critic_num_nodes 1 --critic_num_gpus_per_node 4 --actor_num_nodes 2 --actor_num_gpus_per_node 8 --vllm_num_engines 2 --vllm_tensor_parallel_size 4 --micro_train_batch_size 4 --train_batch_size 64 --micro_rollout_batch_size 4 --rollout_batch_size 64...

LSC527

enhancement

help wanted

Support ORPO

1

Is it considered a new feature ORPO？ ORPO, a technique that replaces SFT+DPO/PPO was released recently. I saw @_philschmid's post regarding it yesterday. Gave ORPO a shot with phi-2 and...

paulcx

why generate use flash-attn is slower?

2

single node deepspeed launch

dshnightmare

enable_ema cause runtime error when running train_ppo_llama.sh

5

`RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cpu! ` several codes make ema model on gpu device: ``` if...

dshnightmare

The tokenizer of reward model and policy model.

2

From the code, the tokenizer of reward model seems to be same as policy model？

eyuansu62

Forced EOS token in vllm generation?

6

I see in `RemoteExperienceMaker._generate_vllm()`, [line 375](https://github.com/OpenLLMAI/OpenRLHF/blob/4e15591a5abd19a14e4a72415603bce76c3e1567/openrlhf/trainer/ppo_utils/experience_maker.py#L375) that for generations that don't finish, i.e. don't output the EOS tokens within the max token limit, we manually set the last token to...

mgerstgrasser

OpenRLHF
OpenRLHF copied to clipboard

Metadata

Documentation for using Kuberay

add test pipeline: use small LLM and small data

add perf and benchmark scripts

Is save checkpoint not yet supported for ppo ray trainer?

Unexpected long actor_time when train_ppo_ray

Support ORPO

why generate use flash-attn is slower?

enable_ema cause runtime error when running train_ppo_llama.sh

The tokenizer of reward model and policy model.

Forced EOS token in vllm generation?

← Metadata

Owner

Metadata

OpenRLHF OpenRLHF copied to clipboard

Metadata

← Metadata

Owner

Metadata

OpenRLHF
OpenRLHF copied to clipboard