litgpt icon indicating copy to clipboard operation
litgpt copied to clipboard

Adding RLHF support

Open rasbt opened this issue 2 years ago • 5 comments

I just see that we don't have an open issue for RLHF support, yet. I think this is a super important feature since latest models like Llama 2 showed that it's really worthwhile. I can also see that we will see more demand for this in the upcoming months, and it'd be nice to add this to the roadmap.

We could potentially look into https://github.com/Eclectic-Sheep/sheeprl for this. Looks like the project itself is using Fabric itself.

I would suggest only focus on PPO for now though.

rasbt avatar Sep 16 '23 19:09 rasbt

Hi, there is an open PR for SheepRL using huggingface transformers for now https://github.com/Eclectic-Sheep/sheeprl/pull/16. After support for lightning 2.1.0 in SheepRL, I can try to integrate Lit-GPT for RLHF.

rcmalli avatar Oct 16 '23 09:10 rcmalli

This would be amazing @rcmalli !

rasbt avatar Oct 18 '23 21:10 rasbt

hi @rcmalli, how is the support for SheepRL with lightning 2.1.0 going?

aniketmaurya avatar Nov 22 '23 15:11 aniketmaurya

Hi everyone, we've just released our new repo for RLHF: https://github.com/Eclectic-Sheep/sheeprlhf. Right now it works with HF models only, but we're working to integrate also lit-gpt :zap: We have a dedicated branch here: https://github.com/Eclectic-Sheep/sheeprlhf/tree/feature/lit-gpt-integration

belerico avatar Nov 24 '23 14:11 belerico

Thank you for proposing RLHF support! After evaluation, we’re deprioritizing this feature due to the outdated LightningAI/trl fork, RLHF’s high computational demands, and the availability of simpler alternatives like DPO, supported by the active huggingface/trl library. Existing tools like axolotl and LLaMA-Factory also cover RLHF needs, and we see limited community engagement on this issue.

We’re exploring DPO integration as a lightweight alignment option, potentially as litgpt finetune dp aligning with LitGPT’s efficiency goals. Please share feedback on DPO or RLHF use cases to help guide our roadmap!

Borda avatar Jun 11 '25 13:06 Borda