GaLore icon indicating copy to clipboard operation
GaLore copied to clipboard

How many GB memory is required to train the 7b model using DDP mode with galore?

Open zhangqijun opened this issue 10 months ago • 1 comments

in sigle gpu mode,I success run the train by RTX3090.but it took too long。 in ddp mode,we got OOM in LlamaForCausalLM = torch.nn.parallel.DistributedDataParallel( model, device_ids=[local_rank], output_device=local_rank, broadcast_buffers=False, ) .

zhangqijun avatar Apr 23 '24 06:04 zhangqijun