reward-models topics

Vicuna-LoRA-RLHF-PyTorch

206

Stars

18

Forks

Watchers

A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT...

jackaduma

chatgpt

finetune

gpt

llama

ChatGLM-LoRA-RLHF-PyTorch

125

Stars

10

Forks

Watchers

A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatG...

jackaduma

chatglm

chatglm-6b

chatgpt

deepspeed

Alpaca-LoRA-RLHF-PyTorch

56

Stars

6

Forks

Watchers

A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT...

jackaduma

alpaca

chatgpt

deepspeed

finetune