llm_rlhf
llm_rlhf copied to clipboard
About Qlora
Now I have implemented Qlora for SFT and reward model but I am quite confused when I do Qlora for PPO, do you plan to integrate PPO into repo?