openchat
openchat copied to clipboard
Training with Deepspeed ZeRO-3
Hello, I know a lot of people want to train OpenChat model but accessing modern GPUs like A100s or H100s is seem difficult. So I tried using ZeRO-3 to train on a cheaper GPU system, 8xA10G with 24GB VRAM.
What I added in this pull request:
ochat/training_deepspeed/deepspeed_config_zero3.json
- Change ZeRO-2 strategy to ZeRO-3.
- Offload to CPU (If you want to use nvme, please edit in the config file).
ochat/training_deepspeed/train_zero3.py
- Change optimizer from torch.optim.AdamW to deepspeed.ops.adam.DeepSpeedCPUAdam
- Save model (in 16bit) on all rank.
README.md
- Added instructions for training with ZeRO-3.