OpenRLHF issues

Update NGC and vllm version.

2

Thanks for the wonderful project! Does OpenRLHF support the use of the latest version of vllm and newer NGC (like 24.02), as vllm now is updated to v0.4.1 currently? Or...

THINK2TRY

[Baseline] LLaMA2-7B RLHF training curves

1

``` deepspeed ./train_ppo.py \ --pretrain OpenLLMAI/Llama-2-7b-sft-model-ocra-500k \ --reward_pretrain OpenLLMAI/Llama-2-7b-rm-anthropic_hh-lmsys-oasst-webgpt \ --save_path ./ckpt/7b_llama \ --save_steps -1 \ --logging_steps 1 \ --eval_steps -1 \ --micro_train_batch_size 2 \ --train_batch_size 128 \ --micro_rollout_batch_size 4...

hijkzzz

vllm +zero2 hangs

32

Team, thank you so much for this wonderful toolkit! we are trying to test the vllm setting with mistralai/Mistral-7B-Instruct-v0.2 model with zero2 ![image](https://github.com/OpenLLMAI/OpenRLHF/assets/35610230/b97439b6-ee2f-4598-9134-74ec075b9ef5) ray job submit --address="http://127.0.0.1:8265" \ --runtime-env-json='{"working_dir": "/openrlhf",...

karthik19967829

The configuration for Llama-7b on 4 RTX4090

5

Hello, I want to run train_ppo_llama_ray.sh on 4 RTX4090, should I modify the actor_num_gpus_per_node/critic_num_gpus_per_node in train_ppo_llama_ray.sh ? As the default script is for 8 gpus, what else should I pay...

LinkyLiu

reward model数据集问题

3

我在模型微调的时候加入了代码数据集，让模型拥有不错的代码能力，在RLHF阶段训练奖励模型的时候还需要再加入代码数据集的训练吗，如果不加入会不会导致模型的代码能力下降

burger-pb

documentation

How long does single LLM's tunning reuqired?

3

alphahumancoder

PPO training configuration for train_ppo_llama.sh

1

Hello, I want to run train_ppo_llama.sh on 4 A100 80G. Do I need to reconfigure the GPU allocation of the 4 models?

MurrayTom

Issue with models not using `position_ids`

1

I think this issue relates to #217 and #218. Some models, for example `facebook/opt-1.3b`, don't accept `position_ids` as an argument and current implementation using it (used [here](https://github.com/OpenLLMAI/OpenRLHF/blob/9f1707201d8a8f40ece44abd6008c9ba56d02cb8/openrlhf/models/actor.py#L176), [here](https://github.com/OpenLLMAI/OpenRLHF/blob/9f1707201d8a8f40ece44abd6008c9ba56d02cb8/openrlhf/models/model.py#L201) and [here](https://github.com/OpenLLMAI/OpenRLHF/blob/9f1707201d8a8f40ece44abd6008c9ba56d02cb8/openrlhf/models/model.py#L262))...

kfertakis

OpenRLHF
OpenRLHF copied to clipboard

Metadata

fix vLLM v0.4.1

Update NGC and vllm version.

内存超出问题

[Baseline] LLaMA2-7B RLHF training curves

vllm +zero2 hangs

The configuration for Llama-7b on 4 RTX4090

reward model数据集问题

How long does single LLM's tunning reuqired?

PPO training configuration for train_ppo_llama.sh

Issue with models not using `position_ids`

← Metadata

Owner

Metadata

OpenRLHF OpenRLHF copied to clipboard

Metadata

← Metadata

Owner

Metadata

OpenRLHF
OpenRLHF copied to clipboard