FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

How to use lora to train the 30b model on multiple machines and multiple cards?

Open Awyshw opened this issue 2 years ago • 3 comments

Awyshw avatar Apr 27 '23 06:04 Awyshw

We've tested that the current script is runnable on multiple cards, a single machine(8 x 40GB A100) using DeepSpeed ZeRO-3. The multi-node case is not tested yet. It's WIP

ZYHowell avatar Apr 30 '23 01:04 ZYHowell

We've tested that the current script is runnable on multiple cards, a single machine(8 x 40GB A100) using DeepSpeed ZeRO-3. The multi-node case is not tested yet. It's WIP

still Doing?

cason0126 avatar May 09 '23 06:05 cason0126

mark

gebilaoman avatar Jun 09 '23 09:06 gebilaoman

@ZYHowell any progress on this? There was an issue about using slurm I've seen around as well...

surak avatar Oct 21 '23 16:10 surak