deep-high-resolution-net.pytorch icon indicating copy to clipboard operation
deep-high-resolution-net.pytorch copied to clipboard

CUDA out of memory

Open s692833 opened this issue 5 years ago • 3 comments

How to fix it?

The error messages are below:

Number of Layers Conv2d : 293 layers BatchNorm2d : 292 layers ReLU : 261 layers Bottleneck : 4 layers BasicBlock : 104 layers Upsample : 28 layers HighResolutionModule : 8 layers
=> load 22246 samples => load 2958 samples Traceback (most recent call last): File "tools/train.py", line 223, in main() File "tools/train.py", line 187, in main final_output_dir, tb_log_dir, writer_dict) File "/data/s10559003/deep-high-resolution-net.pytorch/tools/../lib/core/function.py", line 43, in train outputs = model(input) File "/home/anaconda3/envs/s10559003_deep/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, **kwargs) File "/home/anaconda3/envs/s10559003_deep/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward return self.module(*inputs[0], **kwargs[0]) File "/home/anaconda3/envs/s10559003_deep/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, **kwargs) File "/data/s10559003/deep-high-resolution-net.pytorch/tools/../lib/models/pose_hrnet.py", line 448, in forward y_list = self.stage3(x_list) File "/home/anaconda3/envs/s10559003_deep/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, **kwargs) File "/home/anaconda3/envs/s10559003_deep/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/home/anaconda3/envs/s10559003_deep/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, **kwargs) File "/data/s10559003/deep-high-resolution-net.pytorch/tools/../lib/models/pose_hrnet.py", line 252, in forward x[i] = self.branchesi File "/home/anaconda3/envs/s10559003_deep/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, **kwargs) File "/home/anaconda3/envs/s10559003_deep/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/home/anaconda3/envs/s10559003_deep/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, **kwargs) File "/data/s10559003/deep-high-resolution-net.pytorch/tools/../lib/models/pose_hrnet.py", line 49, in forward out = self.bn2(out) File "/home/anaconda3/envs/s10559003_deep/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, **kwargs) File "/home/anaconda3/envs/s10559003_deep/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 76, in forward exponential_average_factor, self.eps) File "/home/anaconda3/envs/s10559003_deep/lib/python3.6/site-packages/torch/nn/functional.py", line 1623, in batch_norm training, momentum, eps, torch.backends.cudnn.enabled RuntimeError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 10.91 GiB total capacity; 10.25 GiB already allocated; 25.50 MiB free; 10.90 MiB cached)

s692833 avatar Aug 31 '19 10:08 s692833

You can reduce the batch_size.

consistent1997 avatar Oct 09 '19 12:10 consistent1997

modify file: experiments/mpii/hrnet/w32_256x256_adam_lr1e-3.yaml

TRAIN: BATCH_SIZE_PER_GPU: 16

atomtony avatar Mar 16 '20 09:03 atomtony

Does reducing batch_size affect the performance of the model?

zifeiyu-tan avatar Sep 21 '20 06:09 zifeiyu-tan