pytorch-pruning
pytorch-pruning copied to clipboard
CUDA out of memory error: while allocating the memory
Hi
I am working on Tesla K40, 12 GB GPU machine. I am facing this error constantly. If I calculate the required memory for VGG model with respect to the mentioned batch size in dataset.py , the required memory is far less than the available memory of GPU. What could be the reason and how to overcome this? I am facing this after initializing the model and while calling cuda() also.
THCudaCheck FAIL file=/pytorch/torch/lib/THC/THCCachingHostAllocator.cpp line=258 error=2 : out of memory
Traceback (most recent call last):
File "finetune.py", line 272, in
@jagadeesh09 Model parameters are not the only one that occupies the GPU memory, reduce batch_size to 16 or smaller would help.
@guangzhili
When I change the two lines of batch_size
value in dataset.py from 32 to 16 , I have the following error. Why ?
[phung@archlinux pytorch-pruning]$ python finetune.py --train
/usr/lib/python3.7/site-packages/torchvision/transforms/transforms.py:187: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
warnings.warn("The use of the transforms.Scale transform is deprecated, " +
/usr/lib/python3.7/site-packages/torchvision/transforms/transforms.py:562: UserWarning: The use of the transforms.RandomSizedCrop transform is deprecated, please use transforms.RandomResizedCrop instead.
warnings.warn("The use of the transforms.RandomSizedCrop transform is deprecated, " +
Epoch: 0
Accuracy: 0.3398
Epoch: 1
Accuracy: 0.8265
Epoch: 2
Accuracy: 0.6071
Epoch: 3
Accuracy: 0.63
Epoch: 4
Accuracy: 0.5951
Epoch: 5
Accuracy: 0.5837
Epoch: 6
Accuracy: 0.5537
Epoch: 7
Accuracy: 0.5672
Epoch: 8
Accuracy: 0.506
Epoch: 9
Accuracy: 0.5962
Epoch: 10
Accuracy: 0.6039
Epoch: 11
Accuracy: 0.5436
Epoch: 12
Accuracy: 0.6215
Epoch: 13
Accuracy: 0.5622
Epoch: 14
Accuracy: 0.5872
Epoch: 15
Accuracy: 0.5969
Epoch: 16
Accuracy: 0.5741
Epoch: 17
Accuracy: 0.5725
Epoch: 18
Accuracy: 0.6213
Epoch: 19
Accuracy: 0.6483
Finished fine tuning.
[phung@archlinux pytorch-pruning]$ python finetune.py --prune
/usr/lib/python3.7/site-packages/torchvision/transforms/transforms.py:187: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
warnings.warn("The use of the transforms.Scale transform is deprecated, " +
/usr/lib/python3.7/site-packages/torchvision/transforms/transforms.py:562: UserWarning: The use of the transforms.RandomSizedCrop transform is deprecated, please use transforms.RandomResizedCrop instead.
warnings.warn("The use of the transforms.RandomSizedCrop transform is deprecated, " +
Accuracy: 0.6483
Number of prunning iterations to reduce 67% filters 5
Ranking filters..
Traceback (most recent call last):
File "finetune.py", line 270, in <module>
fine_tuner.prune()
File "finetune.py", line 217, in prune
prune_targets = self.get_candidates_to_prune(num_filters_to_prune_per_iteration)
File "finetune.py", line 186, in get_candidates_to_prune
self.prunner.normalize_ranks_per_layer()
File "finetune.py", line 101, in normalize_ranks_per_layer
v = v / np.sqrt(torch.sum(v * v))
File "/usr/lib/python3.7/site-packages/torch/tensor.py", line 432, in __array__
return self.numpy()
TypeError: can't convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
[phung@archlinux pytorch-pruning]$
I have already installed the latest pytorch since it had solved this tensor.cpu() problem
I am using https://www.archlinux.org/packages/community/x86_64/python-pytorch-cuda/
So, what is actually still triggering this tensor.cpu() issue ?
v = v / np.sqrt(torch.sum(v * v))
replace np.sqrt(torch.sum(v * v))
by v.norm
It worked for me. I think that np.sqrt()
requires a variable on cpu
, not gpu
I observed that the out-of-memory still occurs even I change batch size to 16. The first round was OK, but the second wasn't. I think we should delete the previous redundant unused model on GPU to free up memory before allocating the new one.
I met a similiar issue, and solved it by setting pin_memory=false.
https://discuss.pytorch.org/t/using-pined-memory-causes-out-of-memory-error-even-though-batch-size-is-set-to-low-values/30602
I met a similiar issue, and solved it by setting pin_memory=false. https://discuss.pytorch.org/t/using-pined-memory-causes-out-of-memory-error-even-though-batch-size-is-set-to-low-values/30602
Could you clarify the path for the pin_memory? How can I change it into false?