volo Increasing GPU memory in every epoch when running volo-d2 without token labeling.

Increasing GPU memory in every epoch when running volo-d2 without token labeling.

Open Ree1s opened this issue 3 years ago • 2 comments

Hi, thanks for sharing volo, a nice work. I used bash''' export CUDA_VISIBLE_DEVICES=1,4,5,6 python -m torch.distributed.launch --nproc_per_node=4 main.py "path/to/dataset"
--model volo_dd2 --img-size 224
-b 100 --lr 1.0e-3 --drop-path 0.2 --epoch 300 --native-amp
--finetune ./d2_224_85.2.pth.tar GPU memory was increasing when I trained volo-d2 with pretrained model and no token labeling on my own dataset. I added no trick on it and after about 15 epoch it was nearly out of the memory.

Jul 09 '21 03:07 Ree1s

It's a common issue. Similar to https://github.com/rwightman/pytorch-image-models/issues/80. Can you try to add --no-prefetcher flag to see if it solves the problem?

Jul 09 '21 06:07 zihangJiang

I had same issue, but thanks to @zihangJiang I solved it. Do you know why it happen?

Sep 27 '21 11:09 CreamNuts

volo volo copied to clipboard

Increasing GPU memory in every epoch when running volo-d2 without token labeling.

volo
volo copied to clipboard