Training problem for the GPT-j RLHF example

Open LiuX41 opened this issue 2 years ago • 0 comments

I'm trying with RLHF with GPT-j (the given example in examples/summarize_rlhf) on 2 3090 GPUs (with 24GB of memory on each GPU). In order to make the model runnable on my machine (avoiding CUDA out of memory error), I change the config for PPO (num_rollouts -> 32 (which was originally 128) and chunk_size -> 4 (which was originally 16)), and the batch_size for the TrainConfig was changed to 1. All the others are kept unchanged. However, in the finetuning experiment (trlx_gptj_text_summarization.py), the reward continues to drop (from 0 to -3.48 after 6k training steps). Is there any way to determine the reason for this weird problem? Thanks!

Apr 10 '23 07:04 LiuX41