LOMO icon indicating copy to clipboard operation
LOMO copied to clipboard

Memory consumption first grows up then falls down.

Open zhenqin96 opened this issue 1 year ago • 3 comments

Dear authors, it is nice to see this amazing work. When I run this code, I found an interesting phenomenon that when loading the model, it occupies more GPU memory. And when the training starts, the GPU memory consumption will be stabilized at a value slightly lower than the former.

For example, when I run openlm-research/open_llama_7b with deepspeed --master_port "$port" --include localhost:"$CUDA_VISIBLE_DEVICES" src/train_lomo.py config/args_lomo.yaml on a single V100 GPU with batch_size set to 1 and others left to the default values, I find that the GPU memory consumption is first 18588MB before training starts, and during training, it is stabilized at 15933MB. Can you provide more information about this phenomenon? Many thanks!

zhenqin96 avatar Jun 29 '23 03:06 zhenqin96

BTW, the version of pytorch is 2.0

zhenqin96 avatar Jun 29 '23 06:06 zhenqin96

Hi. It's due to some intermediate variables when calling AutoModelForCausalLM.from_pretrained(). And when init deepspeed engine, torch.cuda.empty_cache() will be called to release the memory occupied by these intermediate variables.

KaiLv69 avatar Jun 29 '23 11:06 KaiLv69

Hi. It's due to some intermediate variables when calling AutoModelForCausalLM.from_pretrained(). And when init deepspeed engine, torch.cuda.empty_cache() will be called to release the memory occupied by these intermediate variables.

Thank you very much for your reply!

zhenqin96 avatar Jun 29 '23 11:06 zhenqin96