RLHF
RLHF copied to clipboard
Collection of links, tutorials and best practices of how to collect the data and build end-to-end RLHF system to finetune Generative AI models
The following error occurred while running cell 10 in **6. Tune language model using PPO with our preference model**. After adding `__init__.py` to `/content/trlx/examples/summarize_rlhf/reward_model/`, I still get the same error....
I would like to modify the code in this repository and use it as lecture material.
Where did I find this 'REWARD_CHECKPOINT_PATH' as a bin file? ` 3 rw_tokenizer.pad_token = rw_tokenizer.eos_token 4 rw_model = GPTRewardModel(SFT_MODEL_PATH) ----> 5 rw_model.load_state_dict(REWARD_CHECKPOINT_PATH) 6 rw_model.half() 7 rw_model.eval() `
Why are the rewards truncated in the "GPTRewardModel" class? What is the reason and where can I find more information about it? # Retrieve first index where trajectories diverge divergence_ind...