nnUNet icon indicating copy to clipboard operation
nnUNet copied to clipboard

Prediction on device was unsuccessful, probably due to a lack of memory.

Open elpequeno opened this issue 9 months ago • 5 comments

Hi Fabian, hi everybody,

I am experiencing some issues I have never seen before. I am training nnUNet on VerSe2020 right now. Training seems to work perfectly fine, but during the validation at the end of the training I noticed, that I run out of Memory. I have 187 GB of RAM and have never had issues before. I then manually ran a prediction on just one image and get the following output:

nnUNetv2_predict -i /data/test -o /data/test_out -d 556 -c 3d_fullres --verbose

#######################################################################
Please cite the following paper when using nnU-Net:
Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211.
#######################################################################

There are 1 cases in the source folder
I am process 0 out of 1 (max process ID is 0, we start counting with 0!)
There are 1 cases that I would like to predict
old shape: (379, 512, 512), new_shape: [1236  767  767], old_spacing: [3.259999990463257, 1.3671879768371582, 1.3671879768371582], new_spacing: [1.0, 0.912109375, 0.912109375], fn_data: functools.partial(<function resample_data_or_seg_to_shape at 0x7ff2a6562de0>, is_seg=False, order=3, order_z=0, force_separate_z=None)

Predicting MM256_276_0:
perform_everything_on_device: True
Input shape: torch.Size([1, 1236, 767, 767])
step_size: 0.5
mirror_axes: (0, 1, 2)
n_steps 2299, image size is torch.Size([1236, 767, 767]), tile_size [128, 128, 128], tile_step_size 0.5
steps:
[[0, 62, 123, 185, 246, 308, 369, 431, 492, 554, 616, 677, 739, 800, 862, 923, 985, 1046, 1108], [0, 64, 128, 192, 256, 320, 383, 447, 511, 575, 639], [0, 64, 128, 192, 256, 320, 383, 447, 511, 575, 639]]
move image to device cuda
preallocating results arrays on device cuda
Prediction on device was unsuccessful, probably due to a lack of memory. Moving results arrays to CPU
move image to device cpu
preallocating results arrays on device cpu
running prediction
  0%|                                                                                                                                                                                      | 0/2299 [00:00<?, ?it/s]
/data/an55321/anaconda3/env/nnunetV2/lib/python3.12/site-packages/torch/nn/modules/conv.py:605: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
  return F.conv3d(

I ran several trainings on the same machine before and never had issues. I thought it might berelated to a recent Nvidia driver update we did, so I also updated PyTorch and nnUNet:

...
nnunetv2                      2.3.1
...
torch                         2.3.0
torchaudio                    2.3.0
torchvision                   0.18.0

Any help would be appreciated.

Thanks, André

elpequeno avatar May 01 '24 16:05 elpequeno

Hi André @elpequeno,

did you happen to watch -n 0.1 nvidia-smi during the inference of this (rather large volume)? It indeed seems to be quite memory-intense, given the large number of steps required with the 128 cubed tiles. This might be a case where it would make sense to reduce the tile_step_size. Please let me know if the behavior looks different for a smaller overlap!

GregorKoehler avatar May 02 '24 14:05 GregorKoehler

Hi Gregor,

thank you for your response. Yes, I watched nvidia-smi during the inference. The inference usually takes a long time (several minutes) between There are 1 cases that I would like to predict and old shape: (379, 512, 512), new_shape: [1236 767 767], old_spacing: [3.259999990463257, 1.3671879768371582, 1.3671879768371582], new_spacing: [1.0, 0.912109375, 0.912109375], fn_data: functools.partial(<function resample_data_or_seg_to_shape at 0x7ff2a6562de0>, is_seg=False, order=3, order_z=0, force_separate_z=None)

Only after that do I see any activity on the GPU (I guess that is normal). Then what I see looks similar to the following. Memory usage is usually around 5000 MiB and GPU-Util jumps up and down, but is close to 100% most of the time.

Every 0.1s: nvidia-smi                                                                  itr-gpu01: Fri May  3 04:33:59 2024

Fri May  3 04:33:59 2024

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15	   CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla V100S-PCIE-32GB          On  |   00000000:3B:00.0 Off |                    0 |
| N/A   38C    P0            219W /  250W |    5033MiB /  32768MiB |	100%	  Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla V100S-PCIE-32GB          On  |   00000000:D8:00.0 Off |                    0 |
| N/A   24C    P0             24W /  250W |	  3MiB /  32768MiB |	  0%	  Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage	  |
|=========================================================================================|
|    0   N/A  N/A   3564877      C   ...1/anaconda3/env/nnunetV2/bin/python       5022MiB |
+-----------------------------------------------------------------------------------------+

I tried -step_size 0.1 and -step_size 0.3 but did not see much difference. But I saw this line in the output torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 39.28 GiB. GPU, so I think the behaviour of nnUNet is correct. Just not sure how to deal with this.

elpequeno avatar May 03 '24 08:05 elpequeno

Hi @elpequeno,

I think your best bet to reduce memory consumption is to divide this large volume into chunks and then predicting those separately with nnUNet before merging. Luckily @Karol-G has written a nice tool to help in such cases: https://github.com/MIC-DKFZ/patchly You can take a look at the Readme for examples of how to use it!

GregorKoehler avatar Jun 04 '24 12:06 GregorKoehler

Hi André @elpequeno,

just checking in. Could the patchly recommendation help in solving the OOM error?

GregorKoehler avatar Jun 23 '24 20:06 GregorKoehler

Hi @HussainAlasmawi, please move your case to its own issue in order to keep issues clean and readable. If there's a connection between issues, you can always add a link to a potentially relarted issue.

GregorKoehler avatar Jul 16 '24 07:07 GregorKoehler

Due to this issue being stale for a while, I'll close it for now. Feel free to re-open if you still face this issue.

GregorKoehler avatar Sep 05 '24 20:09 GregorKoehler