Open-Sora icon indicating copy to clipboard operation
Open-Sora copied to clipboard

支持采用ZeRO-Infinity技术使用内存和NVME硬盘来训练模型吗?

Open yt7589 opened this issue 1 year ago • 1 comments

我现在手头只有一台A100 40G、128G内存、1T的NVME硬盘,官方说可以在8块A100 80G上训练,如果采用ZeRO-Infinity技术,我的这个机器应该也可以训练,请问我的这个硬件可以支持全参数训练吗? 另外,想问一下,支持LoRA等PEFT微调方法吗?

yt7589 avatar Mar 06 '24 12:03 yt7589

You can set cpu_offload=True. See https://colossalai.org/docs/basics/booster_plugins#low-level-zero-plugin For nvme offload, see https://colossalai.org/docs/features/nvme_offload You can try full parameter training without offloading if your model and videos are small. We only provide pretraining script now. Finetuning script may be provided in the future.

ver217 avatar Mar 07 '24 02:03 ver217