deepspeed topic
l2hmc-qcd
Application of the L2HMC algorithm to simulations in lattice QCD.
safe-rlhf
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
RLHF
Implementation of Chinese ChatGPT
transformers-language-modeling
Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3
KnowLM
An Open-sourced Knowledgable Large Language Model Framework.
OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
ChatGLM-LoRA-RLHF-PyTorch
A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatG...
llama2-lora-fine-tuning
llama2 finetuning with deepspeed and lora
Alpaca-LoRA-RLHF-PyTorch
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT...