HRNet-Semantic-Segmentation icon indicating copy to clipboard operation
HRNet-Semantic-Segmentation copied to clipboard

RuntimeError: CUDA out of memory. Tried to allocate 900.00 MiB (GPU 0; 10.92 GiB total capacity; 7.83 GiB already allocated; 711.50 MiB free; 9.66 GiB reserved in total by PyTorch)

Open EricHuiK opened this issue 4 years ago • 8 comments

EricHuiK avatar Sep 04 '20 07:09 EricHuiK

Found any solution for it? like Do we need to change any prams to solve it?

GutlapalliNikhil avatar Oct 28 '20 03:10 GutlapalliNikhil

I think you should reduce the batch-size

hieunm1821 avatar Dec 11 '20 09:12 hieunm1821

@hieunm1821 , Yeah, we can reduce batch size or training resolution. Both cases will work.

NikhilChowdary-MCW avatar Dec 29 '20 09:12 NikhilChowdary-MCW

@EricHuiK Do you get solution

Mps24-7uk avatar May 19 '21 10:05 Mps24-7uk

Go to the .yaml in experiments/[dataset name]/..yaml file and update "BATCH_SIZE_PER_GPU" to a lower value. Then, run it as python -m torch.distributed.launch --nproc_per_node=1 tools/train.py ...

A-Kerim avatar Jul 02 '22 12:07 A-Kerim

Go to the .yaml in experiments/[dataset name]/..yaml file and update "BATCH_SIZE_PER_GPU" to a lower value. Then, run it as python -m torch.distributed.launch --nproc_per_node=1 tools/train.py ...

Hi @A-Kerim, I'm trying to run training on a single GPU on Windows 11 (just to see if it's running) and getting OutOfMemoryError. I already reduced the size of batch to 2. Still getting the same error. Would you have any suggestions how to solve this? Thanks!

Arshadoid avatar Nov 30 '22 15:11 Arshadoid

@Arshadoid Give it a try with batch size 1.

GutlapalliNikhil avatar Dec 01 '22 03:12 GutlapalliNikhil

@Arshadoid Give it a try with batch size 1.

Hi @GutlapalliNikhil thanks for the suggestion. I get an error about BatchNorm when try size of 1.

Arshadoid avatar Dec 02 '22 00:12 Arshadoid