GaLore How many GB memory is required to train the 7b model using DDP mode with galore?

How many GB memory is required to train the 7b model using DDP mode with galore?

Open zhangqijun opened this issue 10 months ago • 1 comments

in sigle gpu mode,I success run the train by RTX3090.but it took too long。 in ddp mode，we got OOM in LlamaForCausalLM = torch.nn.parallel.DistributedDataParallel( model, device_ids=[local_rank], output_device=local_rank, broadcast_buffers=False, ) .

Apr 23 '24 06:04 zhangqijun

GaLore GaLore copied to clipboard

How many GB memory is required to train the 7b model using DDP mode with galore?

GaLore
GaLore copied to clipboard