qlora icon indicating copy to clipboard operation
qlora copied to clipboard

Question regarding viability of Qlora with RLHF using TRLX

Open BenSturgeon opened this issue 1 year ago • 1 comments

I am curious about whether this might work on a RLHF process using something like the TRLX library https://github.com/CarperAI/trlx.

Do you think this could be adapted to adjusted to work with that library?

BenSturgeon avatar Jun 01 '23 17:06 BenSturgeon

Hello and thanks for the question. The short answer is yes!

QLoRA wraps a model by encoding in 4-bit and freezing the base model and adding trainable adapters. The wrapped model can theoretically be trained with any objective you would use with standard models. Something to note is that we did not test QLoRA on RL. But given the strong performance in supervised learning tasks I don't have reasons to believe it wouldn't work with RL.

artidoro avatar Jun 01 '23 19:06 artidoro

Awesome, thank you so much for your response!

I was thinking of trying to get this to work with the TRLX library. Do you have any recommendations of how to integrate these two projects? I see both seem to rest on the HuggingFace API for many of the basic tools, but I am still learning a lot of these tools so I am not too sure where to start.

BenSturgeon avatar Jul 17 '23 12:07 BenSturgeon