RLHF icon indicating copy to clipboard operation
RLHF copied to clipboard

Collection of links, tutorials and best practices of how to collect the data and build end-to-end RLHF system to finetune Generative AI models

Results 4 RLHF issues
Sort by recently updated
recently updated
newest added

The following error occurred while running cell 10 in **6. Tune language model using PPO with our preference model**. After adding `__init__.py` to `/content/trlx/examples/summarize_rlhf/reward_model/`, I still get the same error....

I would like to modify the code in this repository and use it as lecture material.

Where did I find this 'REWARD_CHECKPOINT_PATH' as a bin file? ` 3 rw_tokenizer.pad_token = rw_tokenizer.eos_token 4 rw_model = GPTRewardModel(SFT_MODEL_PATH) ----> 5 rw_model.load_state_dict(REWARD_CHECKPOINT_PATH) 6 rw_model.half() 7 rw_model.eval() `

Why are the rewards truncated in the "GPTRewardModel" class? What is the reason and where can I find more information about it? # Retrieve first index where trajectories diverge divergence_ind...