label-studio-ml-backend icon indicating copy to clipboard operation
label-studio-ml-backend copied to clipboard

I want to create a RLHF backend/frontend for labelling<=>training<=>correcting error loop.

Open hemangjoshi37a opened this issue 2 years ago • 8 comments

If anyone has any lead on this please let me know. also anyone want to collaborate on this direction please let me know.

hemangjoshi37a avatar Mar 27 '23 18:03 hemangjoshi37a

Have you checked active learning?

https://docs.heartex.com/guide/active_learning.html

https://www.youtube.com/watch?v=8EO4vOw1MZc

makseq avatar Mar 29 '23 03:03 makseq

@makseq While active learning is good but RLHF is quite different than that becuase it implements Reignforcement Learning for optimization of the model. All in all if you know what is RLHF it is quite different than active learning.

hemangjoshi37a avatar Mar 29 '23 06:03 hemangjoshi37a

Yes, I know, but I expect to see your workflow in LS to achieve it. Seems you need Accept/Reject actions for your annotations? or ranking?

makseq avatar Mar 31 '23 01:03 makseq

Yes the RLHF can be done in multiple ways. You can have yes no type or ranking type.

hemangjoshi37a avatar Mar 31 '23 05:03 hemangjoshi37a

Basically what I propose is the have a generalized RLHF model that goes at the output side of any model and instead of having supervised training we can have unsupervised training that can be supervised by the reinforcement model.

hemangjoshi37a avatar Mar 31 '23 05:03 hemangjoshi37a

Maybe this repo will be helpful for you: https://github.com/heartexlabs/label-studio-RLHF/

makseq avatar Apr 14 '23 01:04 makseq

@makseq maybe it is a private repo. giving me 404 error

hemangjoshi37a avatar Apr 14 '23 04:04 hemangjoshi37a

@hemangjoshi37a Sorry, could you please check this one? https://github.com/heartexlabs/RLHF

makseq avatar Apr 28 '23 01:04 makseq