coax
coax copied to clipboard
Example of using this lib for RLHF?
Just wondering if there are any example of using this lib for implement RLHF (Reinforcement Learning from Human Feedback)?
Inspired by: https://openai.com/blog/chatgpt

Many thanks for any help! :)