How to fix it?
The error messages are below:
Number of Layers
Conv2d : 293 layers BatchNorm2d : 292 layers ReLU : 261 layers Bottleneck : 4 layers BasicBlock : 104 layers Upsample : 28 layers HighResolutionModule : 8 layers
=> load 22246 samples
=> load 2958 samples
Traceback (most recent call last):
File "tools/train.py", line 223, in
main()
File "tools/train.py", line 187, in main
final_output_dir, tb_log_dir, writer_dict)
File "/data/s10559003/deep-high-resolution-net.pytorch/tools/../lib/core/function.py", line 43, in train
outputs = model(input)
File "/home/anaconda3/envs/s10559003_deep/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/s10559003_deep/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/anaconda3/envs/s10559003_deep/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/data/s10559003/deep-high-resolution-net.pytorch/tools/../lib/models/pose_hrnet.py", line 448, in forward
y_list = self.stage3(x_list)
File "/home/anaconda3/envs/s10559003_deep/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/s10559003_deep/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/anaconda3/envs/s10559003_deep/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/data/s10559003/deep-high-resolution-net.pytorch/tools/../lib/models/pose_hrnet.py", line 252, in forward
x[i] = self.branchesi
File "/home/anaconda3/envs/s10559003_deep/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/s10559003_deep/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/anaconda3/envs/s10559003_deep/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/data/s10559003/deep-high-resolution-net.pytorch/tools/../lib/models/pose_hrnet.py", line 49, in forward
out = self.bn2(out)
File "/home/anaconda3/envs/s10559003_deep/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/s10559003_deep/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 76, in forward
exponential_average_factor, self.eps)
File "/home/anaconda3/envs/s10559003_deep/lib/python3.6/site-packages/torch/nn/functional.py", line 1623, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 10.91 GiB total capacity; 10.25 GiB already allocated; 25.50 MiB free; 10.90 MiB cached)
You can reduce the batch_size.
modify file: experiments/mpii/hrnet/w32_256x256_adam_lr1e-3.yaml
TRAIN:
BATCH_SIZE_PER_GPU: 16
Does reducing batch_size affect the performance of the model?