volo
volo copied to clipboard
Increasing GPU memory in every epoch when running volo-d2 without token labeling.
Hi, thanks for sharing volo, a nice work.
I used bash'''
export CUDA_VISIBLE_DEVICES=1,4,5,6
python -m torch.distributed.launch --nproc_per_node=4 main.py "path/to/dataset"
--model volo_dd2 --img-size 224
-b 100 --lr 1.0e-3 --drop-path 0.2 --epoch 300 --native-amp
--finetune ./d2_224_85.2.pth.tar
GPU memory was increasing when I trained volo-d2 with pretrained model and no token labeling on my own dataset. I added no trick on it and after about 15 epoch it was nearly out of the memory.
It's a common issue. Similar to https://github.com/rwightman/pytorch-image-models/issues/80. Can you try to add --no-prefetcher
flag to see if it solves the problem?
I had same issue, but thanks to @zihangJiang I solved it. Do you know why it happen?