Adding RLHF support
I just see that we don't have an open issue for RLHF support, yet. I think this is a super important feature since latest models like Llama 2 showed that it's really worthwhile. I can also see that we will see more demand for this in the upcoming months, and it'd be nice to add this to the roadmap.
We could potentially look into https://github.com/Eclectic-Sheep/sheeprl for this. Looks like the project itself is using Fabric itself.
I would suggest only focus on PPO for now though.
Hi, there is an open PR for SheepRL using huggingface transformers for now https://github.com/Eclectic-Sheep/sheeprl/pull/16. After support for lightning 2.1.0 in SheepRL, I can try to integrate Lit-GPT for RLHF.
This would be amazing @rcmalli !
hi @rcmalli, how is the support for SheepRL with lightning 2.1.0 going?
Hi everyone, we've just released our new repo for RLHF: https://github.com/Eclectic-Sheep/sheeprlhf. Right now it works with HF models only, but we're working to integrate also lit-gpt :zap: We have a dedicated branch here: https://github.com/Eclectic-Sheep/sheeprlhf/tree/feature/lit-gpt-integration
Thank you for proposing RLHF support! After evaluation, we’re deprioritizing this feature due to the outdated LightningAI/trl fork, RLHF’s high computational demands, and the availability of simpler alternatives like DPO, supported by the active huggingface/trl library. Existing tools like axolotl and LLaMA-Factory also cover RLHF needs, and we see limited community engagement on this issue.
We’re exploring DPO integration as a lightweight alignment option, potentially as litgpt finetune dp aligning with LitGPT’s efficiency goals. Please share feedback on DPO or RLHF use cases to help guide our roadmap!