023-09-23 02:12:42.767428: Epoch 0
2023-09-23 02:12:42.767778: Current learning rate: 0.01
using pin_memory on device 0
Traceback (most recent call last):
File "/data/revan/miniconda3/envs/nnUNet/bin/nnUNetv2_train", line 8, in
sys.exit(run_training_entry())
File "/data/revan/experiments/CMR_experiments/nnUNetFrame/nnUNet/nnunetv2/run/run_training.py", line 268, in run_training_entry
run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights,
File "/data/revan/experiments/CMR_experiments/nnUNetFrame/nnUNet/nnunetv2/run/run_training.py", line 204, in run_training
nnunet_trainer.run_training()
File "/data/revan/experiments/CMR_experiments/nnUNetFrame/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1240, in run_training
train_outputs.append(self.train_step(next(self.dataloader_train)))
File "/data/revan/experiments/CMR_experiments/nnUNetFrame/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 881, in train_step
output = self.network(data)
File "/data/revan/miniconda3/envs/nnUNet/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/data/revan/miniconda3/envs/nnUNet/lib/python3.9/site-packages/dynamic_network_architectures/architectures/unet.py", line 60, in forward
return self.decoder(skips)
File "/data/revan/miniconda3/envs/nnUNet/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/data/revan/miniconda3/envs/nnUNet/lib/python3.9/site-packages/dynamic_network_architectures/building_blocks/unet_decoder.py", line 84, in forward
x = self.stagess
File "/data/revan/miniconda3/envs/nnUNet/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/data/revan/miniconda3/envs/nnUNet/lib/python3.9/site-packages/dynamic_network_architectures/building_blocks/simple_conv_blocks.py", line 137, in forward
return self.convs(x)
File "/data/revan/miniconda3/envs/nnUNet/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/data/revan/miniconda3/envs/nnUNet/lib/python3.9/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
File "/data/revan/miniconda3/envs/nnUNet/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/data/revan/miniconda3/envs/nnUNet/lib/python3.9/site-packages/dynamic_network_architectures/building_blocks/simple_conv_blocks.py", line 71, in forward
return self.all_modules(x)
File "/data/revan/miniconda3/envs/nnUNet/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/data/revan/miniconda3/envs/nnUNet/lib/python3.9/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
File "/data/revan/miniconda3/envs/nnUNet/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/data/revan/miniconda3/envs/nnUNet/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/data/revan/miniconda3/envs/nnUNet/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 14.76 GiB total capacity; 1.02 GiB already allocated; 20.75 MiB free; 1.02 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
could anyone help to resolve the above issue ?
I also encountered this error during inference. I attempted to use 2 GPUs, but it appears that it is not functioning correctly.
How can i do?
Hi revanb88,
It is hard to say without nowing more details, but could this issue solve your problem?: https://github.com/MIC-DKFZ/nnUNet/issues/337