LOMO
LOMO copied to clipboard
Memory consumption first grows up then falls down.
Dear authors, it is nice to see this amazing work. When I run this code, I found an interesting phenomenon that when loading the model, it occupies more GPU memory. And when the training starts, the GPU memory consumption will be stabilized at a value slightly lower than the former.
For example, when I run openlm-research/open_llama_7b
with deepspeed --master_port "$port" --include localhost:"$CUDA_VISIBLE_DEVICES" src/train_lomo.py config/args_lomo.yaml
on a single V100 GPU with batch_size
set to 1 and others left to the default values, I find that the GPU memory consumption is first 18588MB before training starts, and during training, it is stabilized at 15933MB. Can you provide more information about this phenomenon? Many thanks!
BTW, the version of pytorch is 2.0
Hi. It's due to some intermediate variables when calling AutoModelForCausalLM.from_pretrained()
. And when init deepspeed engine, torch.cuda.empty_cache()
will be called to release the memory occupied by these intermediate variables.
Hi. It's due to some intermediate variables when calling
AutoModelForCausalLM.from_pretrained()
. And when init deepspeed engine,torch.cuda.empty_cache()
will be called to release the memory occupied by these intermediate variables.
Thank you very much for your reply!