llm_rlhf issues

Results 3 llm_rlhf issues

Sort by recently updated

原理请教

rewardmodel是打分模型，可否用人工代替？若人工代替，则只需要组建三元组和对应分数，即可用强化学习的思路训练模型对吗？

Requirement already satisfied: py-cpuinfo in /home/liuhaiying/anaconda3/envs/ss/lib/python3.10/site-packages (from deepspeed->-r requirements.txt (line 3)) (9.0.0) Requirement already satisfied: pydantic-r requirements.txt (line 3)) (1.10.7) Requirement already satisfied: torch in /home/liuhaiying/anaconda3/envs/ss/lib/python3.10/site-packages (from deepspeed->-r requirements.txt (line...

magnificent1208

安装环境

About Qlora

Now I have implemented Qlora for SFT and reward model but I am quite confused when I do Qlora for PPO, do you plan to integrate PPO into repo?

Iambestfeed

llm_rlhf
llm_rlhf copied to clipboard

Metadata

原理请教

conflicting dependencies

About Qlora

← Metadata

Owner

Metadata

llm_rlhf llm_rlhf copied to clipboard

Metadata

原理请教

conflicting dependencies

About Qlora

← Metadata

Owner

Metadata

llm_rlhf
llm_rlhf copied to clipboard